SE::YouTube::Video - scraper for YouTube video data

Scraper Overview

Scraper for YouTube video data. With this scraper, you can parse all the main data about videos, as well as subtitles and comments. You should use links to YouTube video pages as queries. You can collect video links using SE::YouTube. Using the YouTube video scraper, you can collect all the data about a video in multi-threaded mode.

The functionality of A-Parser allows you to save the scraper's parsing settings SE::YouTube::Video for future use (presets), set a parsing schedule, and much more.

Saving results is possible in the form and structure you need, thanks to the built-in powerful templating engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.

Go to DEMO Buy A-Parser Pro ($299)

Collected Data

Video title and description
Video duration
Number of views, likes, and comments
Link to the preview
Author's name, links to their avatar and channel, as well as the number of subscribers
Subtitles for the video (including timing information)
List of tags
List of comments (including replies to comments)
- Comment ID and parent comment ID (for replies)
- Author's name, link to profile, and avatar
- Comment text and time of publication
List of related videos
- Link
- Video title

Capabilities

Selection of interface language
Selection of subtitles language
Specifying the number of comment pages (approximately 20 comments per page)
Specifying the maximum number of reply pages for each comment (approximately 10 replies on the first page, about 50 on the following pages)
Specifying the number of related video pages (approximately 20 videos per page)

Use Cases

Collecting statistical data about videos on Youtube
Parsing subtitles and comments as a source of text
Searching for related videos

Work Features

Subtitles Language Selection Logic

The scraper uses the following priority (in descending order): author's, author's translated, generated, generated translated.

For example, if the scraper is set to parse English subtitles, then:

if the video has author's English subtitles - the author's will be parsed
if the video has author's subtitles, but in another language - the author's translated into English will be parsed
if the video has no author's subtitles, but there are generated ones in English - the generated will be parsed
if the video has no author's subtitles, and the generated ones are in another language (because the video is in another language) - the generated translated will be parsed

Parsing comments

Comments are collected in a single thread, so their parsing can be quite lengthy, especially when parsing a large number of pages and parsing replies. It is not recommended to set a large number of reply pages, usually 1-3 is enough, or you can completely disable the parsing of replies - this will greatly speed up the work.

Queries

As queries, it is necessary to specify links to videos, for example:

https://www.youtube.com/watch?v=lWA2pjMjpBs
https://www.youtube.com/watch?v=EDwb9jOVRtU
https://www.youtube.com/watch?v=5NPBIwQyPWE

Output results examples

A-Parser supports flexible formatting of results thanks to the built-in template engine Template Toolkit, which allows it to output results in any form, as well as in structured formats, such as CSV or JSON

Default output

Result format:

$query - $title\nViews: $viewsCount, likes: $likesCount, comments: $commentsCount\n

As a result, the video link, its title, number of likes, views, and comments will be displayed:

https://www.youtube.com/watch?v=5NPBIwQyPWE - Avril Lavigne - Complicated (Official Video)
Views: 571331713, likes: 3959948, comments: 143597
https://www.youtube.com/watch?v=EDwb9jOVRtU - Madonna - Hung Up (Official Video) [HD]
Views: 414662791, likes: 2153344, comments: 91895
https://www.youtube.com/watch?v=lWA2pjMjpBs - Rihanna - Diamonds
Views: 2104207258, likes: 10235971, comments: 394622

Subtitles output

Result format:

$query\n$subtitles.format('$text ')\n\n

As a result, the video link and subtitles in the specified language will be displayed.

Output in a CSV table

The built-in tool tools.CSVLine allows you to create correct table documents, ready for import into Excel or Google Sheets.

General result format:

[% tools.CSVline(query, p1.author, p1.date, p1.duration, p1.title, p1.viewsCount, p1.likesCount, p1.commentsCount, p1.tags.format('$tag,')) %]

File name:

$datefile.format().csv

Initial text:

Link,Author,"Publish date",Duration,Title,"Views count","Likes count","Comments count",Tags

tip

In the General format of results, the Template Toolkit template engine is used.

In the file name of the results, you just need to change the file extension to csv.

To make the "Initial text" option available in the Task Editor, you need to activate "More options". In the "Initial text", we write the names of the columns separated by commas and make the second line empty.

Possible settings

note

Common settings for all scrapers

Parameter name	Default value	Description
Interface language	`English`	Choice of interface language
Subtitles language	`English`	Choice of subtitles language
Comments pages count	`5`	Number of comment pages
Pages count for replies	`3`	Number of reply pages for each comment
Pages count for related videos	`5`	Number of pages with related videos

Scraper Overview​

Collected Data​

Capabilities​

Use Cases​

Work Features​

Subtitles Language Selection Logic​

Parsing comments​

Queries​

Output results examples​

Default output​

Subtitles output​

Output in a CSV table​

Possible settings​