Skip to main content

Telegram::GroupScraper - Scraper of data from public groups in Telegram

Telegram

Scraper overview

This scraper collects data about messages from public groups in Telegram. The Telegram group/chat scraper performs the collection of members who have written something in the group, or there is a service notification about their joining the group. You can scrape all content from the groups you need, namely: text, images, links to videos, get information about the publication date, author (name, profile link, avatar).

Its logic of operation is different from other scrapers because it automatically adds requests to iterate through all messages in the group. Because of this, this scraper cannot be used with any others in the same task.

Saving results is possible in the form and structure you need, thanks to the built-in powerful templating engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.

Use cases for the scraper

Collected data

  • Link to the message
  • Author's name, link to their profile, and avatar
  • Message content, depending on the type it can be:
    • text
    • link to a photo
    • link to a video
  • Message publication date

Usage scenarios

  • Collecting a list of group members
  • Collecting the content of all messages in the group

Queries

As queries, you need to specify a link to a public group, for example:

https://t.me/a_parser

Output options for results

A-Parser supports flexible formatting of results thanks to the built-in templating engine Template Toolkit, which allows it to output results in any form, as well as in structured formats, such as CSV or JSON

Default output

Result format:

$user_name($user_link): $message_text\n

Example of result:

(https://t.me/aparser): Чтобы обходить ограничение на 10 запросов с одного IP нужно дополнительно парсить key= с основной страницы
(https://t.me/aparser): Сейчас посмотрю
(https://t.me/aparser): <a href="http://a-parser.com/threads/1795/" target="_blank" rel="noopener">http://a-parser.com/threads/1795/</a>

Output in CSV table

Result format:

[% tools.CSVline(query, user_link, message_date, message_text) %]

Example of result:

https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:01:09+00:00,"Настройки - Сохранять размер окна"
https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:14:47+00:00,"я запускаю 20 заданий по 300 потоков, с динамическим лимитов в 1200, они выполняются гораздо быстрее за счет того работают все одновременно и нет затыков когда осталось мало запросов(потоков)"
https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:27:06+00:00,"ну прокси в т.ч."

Results processing

A-Parser allows you to process results directly during scraping, in this section we have provided the most popular use cases for the Telegram scraper

Filtering results by the occurrence of words in the message

Example

You need to add a filter and select from the dropdown list $message_text - Message text. Choose the type Regex matches. In the field for regex, write the regex with the necessary words:

\bпарсер\b|\bGoogle\b|\byandex\b|\bпарсер\b|\bПрокси\b|\bДорки\b

\b - word boundary

| - OR

is - regex flag

Download example

How to import an example into A-Parser

eJyVVN1v2jAQ/18sHjaJ8qFSacobRaLaxEpX6BOg6oovqVfH9myHgTL+952dkMC6
PewhVu53v/v2uWQe3Jt7sOjQO5asSmbiP0uYR4mZhfyK6+0V7iE3ElmXGbAObeCu
2LKmJMmd1YVZbC0YtETimEIhPeuWzB8MkrdUSB9V5D5oEpajc5Dhs8c9EWvCsmJb
zMFvXwnegSwCsl6/rIvB6DqN5yCco+ocRuSmRtYvv+i70zqTWAsHUBz3tfAfToIw
TFvSNcYTzgw+nVNHLak2gIZElWjjhVZUinDsuNmcOuGm2lKxhHfMsFd3v1EuYIdL
XfUPW3hK0j3koTEdDh6DtpdGRx8+9nxsKHAuQkSQVYQwsjbqkxI/YmOVJi79WoFu
anUeJx8dBPBwym7FOlFm5KKItt8qG5akIB12maNUp0CJ8D81ggYLXtt57ADhJdNq
LOUMdyhbWvR/WwjJ6X6NUzL6XBv+nTJ/5+PYlHceaof2p6UcGi9Rup1/ba24numM
KudhUFLkwpPsJrpQYTADAt8QTdOz+9CzXFtswnhbYBOc1smg4kRsJzY2LXRRxcVU
LsGtVqnI5pS/FRxPzEItaWfnaqLDRoayVCElTcXhY3s7xq6eQhCayt8ZT2KIUPlp
Y5nXWroviypVYwXdvpuQYE6NPI9au9yClE+Ps3MNa28UCa/eG5f0+76XYx+eq/eD
BUOPmaYbRWUdN83D0rxF5b+el6Q80si+u4fKIBQY6IRRp1zcseHxN9yRrow=

Possible settings

ParameterDefault ValueDescription
Max empty posts1000This parameter specifies how many consecutive empty (non-existent) messages should be there for the scraper to stop parsing the current request
Start message number1This parameter specifies from which number to collect messages in the Telegram chat