Skip to main content

SE::YouTube::Video - YouTube Video Data Scraper

Overview of the scraper

The YouTube Video Data Scraper. With this scraper, you can scrape all basic video data, as well as subtitles and comments. Queries should be links to YouTube video pages. Links to videos can be collected using SE::YouTubeSE::YouTube. Using the YouTube video scraper, you can collect all data about a video in multithreaded mode.

A-Parser functionality allows you to save scraping settings for the SE::YouTube::Video scraper for future use (presets), ), set a scraping schedule, and much more.

Results can be saved in the format and structure you need, thanks to the powerful built-in templating engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL and CSV.

Collected data

  • Video title and description
  • Video duration
  • Number of views, likes, and comments
  • Link to the preview image
  • Author's name, links to their avatar and channel, as well as the number of subscribers
  • Video subtitles (including display time information)
  • List of tags
  • List of comments (including comment replies)
    • Comment ID and Parent Comment ID (for replies)
    • Author's name, profile link, and avatar
    • Comment text and publication time
  • List of related videos
    • Link and video title
    • Author and date
    • Number of views and video duration
  • Video chapter information ($chapters)
    • Title, start time in seconds, and a link to the preview image

Capabilities

  • Interface language selection
  • Subtitle language selection
  • Specifying the number of comment pages (approx. 20 comments per page)
  • Specifying the maximum number of reply pages for each comment (approx. 10 replies on the first page, approx. 50 on subsequent ones)
  • Specifying the number of related video pages (approx. 20 videos per page)
  • Shorts support

Use cases

  • Collecting statistical data about YouTube videos
  • Scraping subtitles and comments as a source of text data
  • Searching for related videos

Features

Subtitle language selection logic

The scraper uses the following priority (in descending order): original, original translated, generated, generated translated.

For example, if the scraper is set to scrape English subtitles, then:

  • if the video has original English subtitles, original subtitles will be scraped
  • if the video has original subtitles but in a different language, original translated to English will be scraped
  • if the video does not have original subtitles but has generated ones in English, generated subtitles will be scraped
  • if the video does not have original subtitles, and generated ones are in another language (because the video is in another language), generated translated subtitles will be scraped

Scraping comments

Comments are collected in a single thread, so their scraping can be quite time-consuming, especially when scraping a large number of pages and replies. It is recommended not to set a large number of reply pages; usually 1-3 is enough, or you can disable reply scraping entirely, which will significantly speed up the process.

Queries

Queries must be video links, for example:

https://www.youtube.com/watch?v=lWA2pjMjpBs
https://www.youtube.com/watch?v=EDwb9jOVRtU
https://www.youtube.com/watch?v=5NPBIwQyPWE

Output results examples

A-Parser supports flexible result formatting thanks to the built-in templating engine Template Toolkit, which allows it to output results in an arbitrary form, as well as in a structured format, such as CSV or JSON

Default output

Result format:

$query - $title\nViews: $viewsCount, likes: $likesCount, comments: $commentsCount\n

The result will display the video link, its title, number of likes, views, and comments:

https://www.youtube.com/watch?v=5NPBIwQyPWE - Avril Lavigne - Complicated (Official Video)
Views: 571331713, likes: 3959948, comments: 143597
https://www.youtube.com/watch?v=EDwb9jOVRtU - Madonna - Hung Up (Official Video) [HD]
Views: 414662791, likes: 2153344, comments: 91895
https://www.youtube.com/watch?v=lWA2pjMjpBs - Rihanna - Diamonds
Views: 2104207258, likes: 10235971, comments: 394622

Subtitle output

Result format:

$query\n$subtitles.format('$text ')\n\n

The result will display the video link and subtitles in the specified language.

Output to a CSV table

The built-in tools.CSVLine tool allows you to create correct tabular documents ready for import into Excel or Google Sheets.

General result format:

[% tools.CSVline(query, p1.author, p1.date, p1.duration, p1.title, p1.viewsCount, p1.likesCount, p1.commentsCount, p1.tags.format('$tag,')) %]

File name:

$datefile.format().csv

Initial text:

Link,Author,"Publish date",Duration,Title,"Views count","Likes count","Comments count",Tags

tip

The General Result Format uses the Template Toolkit templating engine.

In the result file name, you just need to change the file extension to csv.

For the "Initial text" option to be available in the Job Editor, , you need to activate "More options". In "Initial text", enter the column names separated by commas and leave the second line empty.

Possible settings

Parameter nameDefault valueDescription
Interface languageEnglishInterface language selection
Subtitles languageEnglishSubtitle language selection
Comments pages count5Number of comment pages
Pages count for replies3Number of reply pages for each comment
Pages count for related videos5Number of pages with related videos
Login required is errorInstructs the scraper to treat the authorization required message as an error and retry