Skip to main content

SE::YouTube::Video - YouTube video data parser

Parser overview

YouTube video data parser. With this parser, you can scrape all main video data, as well as subtitles and comments. Use links to YouTube video pages as queries. You can collect video links using SE::YouTubeSE::YouTube. Using the YouTube video parser, you can collect all video data in multi-threaded mode.

A-Parser functionality allows you to save SE::YouTube::Video parsing settings for future use (presets), set up parsing schedules, and much more.

Saving results is possible in any form and structure you need, thanks to the built-in powerful Template Toolkit which allows applying additional logic to results and outputting data in various formats, including JSON, SQL, and CSV.

Collected data

  • Video title and description
  • Video duration
  • Number of views, likes, and comments
  • Link to the preview image
  • Author's name, links to their avatar and channel, and subscriber count
  • Video subtitles (including timestamp information)
  • List of tags
  • List of comments (including replies to comments)
    • Comment ID and parent comment ID (for replies)
    • Author's name, link to profile and avatar
    • Comment text and publication time
  • List of related videos
    • Link and video title
    • Author and date
    • View count and video duration
  • Information about video chapters ($chapters)
    • Title, start time in seconds, and link to the preview image

Capabilities

  • Interface language selection
  • Subtitle language selection
  • Specifying the number of comment pages (approximately 20 comments per page)
  • Specifying the maximum number of reply pages for each comment (approximately 10 replies on the first page, about 50 on subsequent ones)
  • Specifying the number of related video pages (approximately 20 videos per page)
  • Shorts support

Use cases

  • Collecting statistical data about YouTube videos
  • Parsing subtitles and comments as a source of text content
  • Searching for related videos

Features

Subtitle language selection logic

The parser uses the following priority (in descending order): author-provided, author-translated, auto-generated, auto-generated translated.

For example, if the parser is set to parse English subtitles, then:

  • if the video has author-provided English subtitles - author-provided will be parsed
  • if the video has author-provided subtitles but in another language - author-translated into English will be parsed
  • if the video has no author-provided subtitles but has auto-generated in English - auto-generated will be parsed
  • if the video has no author-provided subtitles and auto-generated are in another language (since the video is in another language) - auto-generated translated will be parsed

Parsing comments

Comments are collected in a single thread, so parsing them can be quite time-consuming, especially when parsing a large number of pages and replies. It is not recommended to set a large number of reply pages; usually 1-3 is enough, or you can disable reply parsing entirely - this will significantly speed up the process.

Queries

Video links must be specified as queries, for example:

https://www.youtube.com/watch?v=lWA2pjMjpBs
https://www.youtube.com/watch?v=EDwb9jOVRtU
https://www.youtube.com/watch?v=5NPBIwQyPWE

Output results examples

A-Parser supports flexible result formatting thanks to the built-in Template Toolkit, which allows it to output results in any form, as well as structured formats like CSV or JSON

Default output

Result format:

$query - $title\nViews: $viewsCount, likes: $likesCount, comments: $commentsCount\n

The result will display the video link, its title, number of likes, views, and comments:

https://www.youtube.com/watch?v=5NPBIwQyPWE - Avril Lavigne - Complicated (Official Video)
Views: 571331713, likes: 3959948, comments: 143597
https://www.youtube.com/watch?v=EDwb9jOVRtU - Madonna - Hung Up (Official Video) [HD]
Views: 414662791, likes: 2153344, comments: 91895
https://www.youtube.com/watch?v=lWA2pjMjpBs - Rihanna - Diamonds
Views: 2104207258, likes: 10235971, comments: 394622

Subtitles output

Result format:

$query\n$subtitles.format('$text ')\n\n

The result will display the video link and subtitles in the specified language.

Output in CSV table

The built-in tools.CSVLine tool allows creating correct tabular documents ready for import into Excel or Google Sheets.

General result format:

[% tools.CSVline(query, p1.author, p1.date, p1.duration, p1.title, p1.viewsCount, p1.likesCount, p1.commentsCount, p1.tags.format('$tag,')) %]

File name:

$datefile.format().csv

Initial text:

Link,Author,"Publish date",Duration,Title,"Views count","Likes count","Comments count",Tags

tip

The Template Toolkit engine is used in the General result format.

In the result file name, simply change the file extension to csv.

To make the "Initial text" option available in the Task Editor, you need to activate "More options". In "Initial text", write the column names separated by commas and make the second line empty.

Possible settings

Parameter nameDefault valueDescription
Interface languageEnglishInterface language selection
Subtitles languageEnglishSubtitle language selection
Comments pages count5Number of comment pages
Pages count for replies3Number of reply pages for each comment
Pages count for related videos5Number of pages with related videos
Login required is errorTells the parser to treat the authorization required message as an error and retry