SE::YouTube::Video - YouTube video data parser
Parser overview
YouTube video data parser. With this parser, you can scrape all main video data, as well as subtitles and comments. Use links to YouTube video pages as queries. You can collect video links using
SE::YouTube. Using the YouTube video parser, you can collect all video data in multi-threaded mode.
A-Parser functionality allows you to save SE::YouTube::Video parsing settings for future use (presets), set up parsing schedules, and much more.
Saving results is possible in any form and structure you need, thanks to the built-in powerful Template Toolkit which allows applying additional logic to results and outputting data in various formats, including JSON, SQL, and CSV.
Collected data
- Video title and description
- Video duration
- Number of views, likes, and comments
- Link to the preview image
- Author's name, links to their avatar and channel, and subscriber count
- Video subtitles (including timestamp information)
- List of tags
- List of comments (including replies to comments)
- Comment ID and parent comment ID (for replies)
- Author's name, link to profile and avatar
- Comment text and publication time
- List of related videos
- Link and video title
- Author and date
- View count and video duration
- Information about video chapters ($chapters)
- Title, start time in seconds, and link to the preview image
Capabilities
- Interface language selection
- Subtitle language selection
- Specifying the number of comment pages (approximately 20 comments per page)
- Specifying the maximum number of reply pages for each comment (approximately 10 replies on the first page, about 50 on subsequent ones)
- Specifying the number of related video pages (approximately 20 videos per page)
- Shorts support
Use cases
- Collecting statistical data about YouTube videos
- Parsing subtitles and comments as a source of text content
- Searching for related videos
Features
Subtitle language selection logic
The parser uses the following priority (in descending order): author-provided, author-translated, auto-generated, auto-generated translated.
For example, if the parser is set to parse English subtitles, then:
- if the video has author-provided English subtitles - author-provided will be parsed
- if the video has author-provided subtitles but in another language - author-translated into English will be parsed
- if the video has no author-provided subtitles but has auto-generated in English - auto-generated will be parsed
- if the video has no author-provided subtitles and auto-generated are in another language (since the video is in another language) - auto-generated translated will be parsed
Parsing comments
Comments are collected in a single thread, so parsing them can be quite time-consuming, especially when parsing a large number of pages and replies. It is not recommended to set a large number of reply pages; usually 1-3 is enough, or you can disable reply parsing entirely - this will significantly speed up the process.
Queries
Video links must be specified as queries, for example:
https://www.youtube.com/watch?v=lWA2pjMjpBs
https://www.youtube.com/watch?v=EDwb9jOVRtU
https://www.youtube.com/watch?v=5NPBIwQyPWE
Output results examples
A-Parser supports flexible result formatting thanks to the built-in Template Toolkit, which allows it to output results in any form, as well as structured formats like CSV or JSON
Default output
Result format:
$query - $title\nViews: $viewsCount, likes: $likesCount, comments: $commentsCount\n
The result will display the video link, its title, number of likes, views, and comments:
https://www.youtube.com/watch?v=5NPBIwQyPWE - Avril Lavigne - Complicated (Official Video)
Views: 571331713, likes: 3959948, comments: 143597
https://www.youtube.com/watch?v=EDwb9jOVRtU - Madonna - Hung Up (Official Video) [HD]
Views: 414662791, likes: 2153344, comments: 91895
https://www.youtube.com/watch?v=lWA2pjMjpBs - Rihanna - Diamonds
Views: 2104207258, likes: 10235971, comments: 394622
Subtitles output
Result format:
$query\n$subtitles.format('$text ')\n\n
The result will display the video link and subtitles in the specified language.
Output in CSV table
The built-in tools.CSVLine tool allows creating correct tabular documents ready for import into Excel or Google Sheets.
General result format:
[% tools.CSVline(query, p1.author, p1.date, p1.duration, p1.title, p1.viewsCount, p1.likesCount, p1.commentsCount, p1.tags.format('$tag,')) %]
File name:
$datefile.format().csv
Initial text:
Link,Author,"Publish date",Duration,Title,"Views count","Likes count","Comments count",Tags
The Template Toolkit engine is used in the General result format.
In the result file name, simply change the file extension to csv.
To make the "Initial text" option available in the Task Editor, you need to activate "More options". In "Initial text", write the column names separated by commas and make the second line empty.
Possible settings
| Parameter name | Default value | Description |
|---|---|---|
| Interface language | English | Interface language selection |
| Subtitles language | English | Subtitle language selection |
| Comments pages count | 5 | Number of comment pages |
| Pages count for replies | 3 | Number of reply pages for each comment |
| Pages count for related videos | 5 | Number of pages with related videos |
| Login required is error | ☑ | Tells the parser to treat the authorization required message as an error and retry |