Skip to main content

Telegram::GroupScraper - Scraper for data from public Telegram groups

Telegram

Overview of the scraper

This scraper collects message data from public groups on Telegram. The Telegram group/chat scraper collects participants who have posted something in the group, or for whom there is a service notification of joining the group. You can scrape all content from the desired groups, namely: text, images, video links, and get information about the publication date, author (name, profile link, avatar).

Its operating logic differs from other scrapers, as it automatically adds queries to iterate through all messages in the group. Because of this, this scraper cannot be used with any others in the same task.

Results can be saved in the form and structure you need, thanks to the built-in powerful templater Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL and CSV.

Scraper Use Cases

Collected Data

  • Link to the message
  • Author's name, link to their profile, and avatar
  • Message content, depending on the type, this can be:
    • text
    • link to photo
    • link to video
  • Message publication date

Use Cases

  • Collecting a list of group members
  • Collecting the content of all messages in a group

Queries

For queries, you should specify a link to a public group, for example:

https://t.me/a_parser

Output Results Examples

A-Parser supports flexible result formatting thanks to the built-in templater Template Toolkit, which allows it to output results in any desired form, as well as in structured forms, such as CSV or JSON

Default Output

Result format:

$user_name($user_link): $message_text\n

Example result:

(https://t.me/aparser): To bypass the 10-query limit from one IP, you need to additionally scrape the key= from the main page
(https://t.me/aparser): I'll check now
(https://t.me/aparser): <a href="http://a-parser.com/threads/1795/" target="_blank" rel="noopener">http://a-parser.com/threads/1795/</a>

Output in CSV Table

Result format:

[% tools.CSVline(query, user_link, message_date, message_text) %]

Example result:

https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:01:09+00:00,"Settings - Save window size"
https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:14:47+00:00,"I run 20 tasks with 300 threads, with a dynamic limit of 1200, they execute much faster because they all work simultaneously and there are no delays when few queries (threads) remain"
https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:27:06+00:00,"including proxies"

Results Processing

A-Parser allows you to process results directly during scraping; in this section, we've provided the most popular use cases for the Telegram scraper

Filtering results by occurrence of words in the message

Example

You need to add a filter and select $message_text - Message text from the dropdown list. Choose the type Regex matches. In the regex field, enter the regex with the required words:

\\bscraper\\b|\\bGoogle\\b|\\byandex\\b|\\bscraper\\b|\\bProxy\\b|\\bDorks\\b

\b - word boundary

| - OR

is - regex flag

Download example

How to import the example into A-Parser

eJyVVN1v2jAQ/18sHjaJ8qFSacobRaLaxEpX6BOg6oovqVfH9myHgTL+952dkMC6
PewhVu53v/v2uWQe3Jt7sOjQO5asSmbiP0uYR4mZhfyK6+0V7iE3ElmXGbAObeCu
2LKmJMmd1YVZbC0YtETimEIhPeuWzB8MkrdUSB9V5D5oEpajc5Dhs8c9EWvCsmJb
zMFvXwnegSwCsl6/rIvB6DqN5yCco+ocRuSmRtYvv+i70zqTWAsHUBz3tfAfToIw
TFvSNcYTzgw+nVNHLak2gIZElWjjhVZUinDsuNmcOuGm2lKxhHfMsFd3v1EuYIdL
XfUPW3hK0j3koTEdDh6DtpdGRx8+9nxsKHAuQkSQVYQwsjbqkxI/YmOVJi79WoFu
anUeJx8dBPBwym7FOlFm5KKItt8qG5akIB12maNUp0CJ8D81ggYLXtt57ADhJdNq
LOUMdyhbWvR/WwjJ6X6NUzL6XBv+nTJ/5+PYlHceaof2p6UcGi9Rup1/ba24numM
KudhUFLkwpPsJrpQYTADAt8QTdOz+9CzXFtswnhbYBOc1smg4kRsJzY2LXRRxcVU
LsGtVqnI5pS/FRxPzEItaWfnaqLDRoayVCElTcXhY3s7xq6eQhCayt8ZT2KIUPlp
Y5nXWroviypVYwXdvpuQYE6NPI9au9yClE+Ps3MNa28UCa/eG5f0+76XYx+eq/eD
BUOPmaYbRWUdN83D0rxF5b+el6Q80si+u4fKIBQY6IRRp1zcseHxN9yRrow=

Possible Settings

ParameterDefault ValueDescription
Max empty posts1000This parameter specifies how many consecutive empty (non-existent) messages there should be for the scraping to stop for the current query
Start message number1This parameter specifies from which number to collect messages in the Telegram chat