SE::YouTube::Video - scraper for YouTube video data
Scraper Overview
Scraper for YouTube video data. With this scraper, you can parse all the main data about videos, as well as subtitles and comments. You should use links to YouTube video pages as queries. You can collect video links using SE::YouTube. Using the YouTube video scraper, you can collect all the data about a video in multi-threaded mode.
The functionality of A-Parser allows you to save the scraper's parsing settings SE::YouTube::Video for future use (presets), set a parsing schedule, and much more.
Saving results is possible in the form and structure you need, thanks to the built-in powerful templating engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.
Collected Data
- Video title and description
- Video duration
- Number of views, likes, and comments
- Link to the preview
- Author's name, links to their avatar and channel, as well as the number of subscribers
- Subtitles for the video (including timing information)
- List of tags
- List of comments (including replies to comments)
- Comment ID and parent comment ID (for replies)
- Author's name, link to profile, and avatar
- Comment text and time of publication
- List of related videos
- Link
- Video title
Capabilities
- Selection of interface language
- Selection of subtitles language
- Specifying the number of comment pages (approximately 20 comments per page)
- Specifying the maximum number of reply pages for each comment (approximately 10 replies on the first page, about 50 on the following pages)
- Specifying the number of related video pages (approximately 20 videos per page)
Use Cases
- Collecting statistical data about videos on Youtube
- Parsing subtitles and comments as a source of text
- Searching for related videos
Work Features
Subtitles Language Selection Logic
The scraper uses the following priority (in descending order): author's, author's translated, generated, generated translated.
For example, if the scraper is set to parse English subtitles, then:
- if the video has author's English subtitles - the author's will be parsed
- if the video has author's subtitles, but in another language - the author's translated into English will be parsed
- if the video has no author's subtitles, but there are generated ones in English - the generated will be parsed
- if the video has no author's subtitles, and the generated ones are in another language (because the video is in another language) - the generated translated will be parsed
Parsing comments
Comments are collected in a single thread, so their parsing can be quite lengthy, especially when parsing a large number of pages and parsing replies. It is not recommended to set a large number of reply pages, usually 1-3 is enough, or you can completely disable the parsing of replies - this will greatly speed up the work.
Queries
As queries, it is necessary to specify links to videos, for example:
https://www.youtube.com/watch?v=lWA2pjMjpBs
https://www.youtube.com/watch?v=EDwb9jOVRtU
https://www.youtube.com/watch?v=5NPBIwQyPWE
Output results examples
A-Parser supports flexible formatting of results thanks to the built-in template engine Template Toolkit, which allows it to output results in any form, as well as in structured formats, such as CSV or JSON
Default output
Result format:
$query - $title\nViews: $viewsCount, likes: $likesCount, comments: $commentsCount\n
As a result, the video link, its title, number of likes, views, and comments will be displayed:
https://www.youtube.com/watch?v=5NPBIwQyPWE - Avril Lavigne - Complicated (Official Video)
Views: 571331713, likes: 3959948, comments: 143597
https://www.youtube.com/watch?v=EDwb9jOVRtU - Madonna - Hung Up (Official Video) [HD]
Views: 414662791, likes: 2153344, comments: 91895
https://www.youtube.com/watch?v=lWA2pjMjpBs - Rihanna - Diamonds
Views: 2104207258, likes: 10235971, comments: 394622
Subtitles output
Result format:
$query\n$subtitles.format('$text ')\n\n
As a result, the video link and subtitles in the specified language will be displayed.
Output in a CSV table
The built-in tool tools.CSVLine allows you to create correct table documents, ready for import into Excel or Google Sheets.
General result format:
[% tools.CSVline(query, p1.author, p1.date, p1.duration, p1.title, p1.viewsCount, p1.likesCount, p1.commentsCount, p1.tags.format('$tag,')) %]
File name:
$datefile.format().csv
Initial text:
Link,Author,"Publish date",Duration,Title,"Views count","Likes count","Comments count",Tags
In the General format of results, the Template Toolkit template engine is used.
In the file name of the results, you just need to change the file extension to csv.
To make the "Initial text" option available in the Task Editor, you need to activate "More options". In the "Initial text", we write the names of the columns separated by commas and make the second line empty.
Possible settings
Parameter name | Default value | Description |
---|---|---|
Interface language | English | Choice of interface language |
Subtitles language | English | Choice of subtitles language |
Comments pages count | 5 | Number of comment pages |
Pages count for replies | 3 | Number of reply pages for each comment |
Pages count for related videos | 5 | Number of pages with related videos |