SE::YouTube::Video - YouTube video scraper
Overview of SE::YouTube::Video scraper
YouTube video scraper. With this scraper, you can collect all the main data about a video, as well as subtitles and comments. To make requests, you need to use links to pages with videos on YouTube. You can collect all the data about the video in a multi-threaded mode using the SE::YouTube video scraper.
A-Parser's functionality allows you to save the parsing settings of the SE::YouTube::Video scraper for further use (presets), set up a parsing schedule, and much more.
You can save the results in the format and structure you need, thanks to the built-in powerful Template Toolkit template engine, which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.
List of collected data
- Video title and description
- Video duration
- Number of views, likes, and comments
- Link to preview
- Author's name, links to their avatar and channel, and number of subscribers
- Video subtitles (including display time information)
- List of tags
- List of comments (including replies to comments)
- Comment ID and parent comment ID (for replies)
- Author's name, link to their profile, and avatar
- Comment text and publication time
- List of similar videos
- Video title
- Choice of interface language
- Choice of subtitle language
- Specifying the number of comment pages (approximately 20 comments per page)
- Specifying the maximum number of reply pages for each comment (approximately 10 replies on the first page, approximately 50 on subsequent pages)
- Specifying the number of similar video pages (approximately 20 videos per page)
- Collecting statistical data about videos on YouTube
- Parsing subtitles and comments as a source of text
- Finding similar videos
Subtitle language selection logic
The scraper uses the following priority (in descending order): author's, author's translated, generated, generated translated.
For example, if the scraper is set to parse English subtitles, then:
- if the video has author's English subtitles - author's subtitles will be parsed
- if the video has author's subtitles, but in another language - author's subtitles translated into English will be parsed
- if the video does not have author's subtitles, but has generated subtitles in English - generated subtitles will be parsed
- if the video does not have author's subtitles, but has generated subtitles in another language (because the video is in another language) - generated subtitles translated into English will be parsed
Comments are collected in one thread, so their parsing can be quite long, especially when parsing a large number of pages and parsing replies. It is not recommended to set a large number of reply pages, usually 1-3 is enough, or you can completely disable reply parsing - this will greatly speed up the process.
- You need to specify links to videos as queries, for example:
Result output options
A-Parser supports flexible result formatting thanks to the built-in Template Toolkit template engine, which allows it to output results in any form, as well as in a structured form, such as CSV or JSON.
By default, the link to the video, its title, number of likes, views, and comments will be output:
https://www.youtube.com/watch?v=5NPBIwQyPWE - Avril Lavigne - Complicated (Official Video)
Views: 571331713, likes: 3959948, comments: 143597
https://www.youtube.com/watch?v=EDwb9jOVRtU - Madonna - Hung Up (Official Video) [HD]
Views: 414662791, likes: 2153344, comments: 91895
https://www.youtube.com/watch?v=lWA2pjMjpBs - Rihanna - Diamonds
Views: 2104207258, likes: 10235971, comments: 394622
Outputting video information to a CSV table
The built-in tools.CSVLine tool allows you to create correct tabular documents ready for import into Excel or Google Sheets.
General result format:
[% tools.CSVline(query, p1.author, p1.date, p1.duration, p1.title, p1.viewsCount, p1.likesCount, p1.commentsCount, p1.tags.format('$tag,')) %]
Link,Author,"Publish date",Duration,Title,"Views count","Likes count","Comments count",Tags
The Template Toolkit template engine is used in the General Results Format.
What is the General Results Format?
To change the file extension to csv in the results file name.
To make the "Initial text" option available in the Task Editor, you need to activate "More options". In the "Initial text" field, write the column names separated by commas and leave the second row empty.
The result will contain a link to the video and subtitles in the specified language.
|Parameter Name||Default Value||Description|
|Interface language||Interface language selection|
|Subtitles language||Subtitles language selection|
|Comments pages count||Number of comment pages|
|Pages count for replies||Number of pages with replies to each comment|
|Pages count for related videos||Number of pages with related videos|