Skip to main content

Telegram::GroupScraper - Data scraper for public groups in Telegram

Overview of Telegram Groupscraper data scraper

img

This scraper collects data about messages from public groups in Telegram. The Telegram group/chat scraper collects participants who have written something in the group or have a service notification about their joining the group. You can parse all the content from the necessary groups, namely: text, images, links to videos, get information about the publication date, author (name, profile link, avatar).

Its logic is different from other scrapers, as it automatically adds requests to iterate through all messages in the group. Because of this, this scraper cannot be used together with any other in one task.

The results can be saved in the format and structure that you need, thanks to the built-in powerful templating engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL and CSV.

Use cases for the Telegram public group scraper

User scraping

Scraping users from public groups in Telegram

Scraping all messages

Scraping all messages from public groups in Telegram

Collected data

  • Link to the message
  • Author's name, link to their profile and avatar
  • Message content, depending on the type it can be:
    • text
    • link to photo
    • link to video
  • Message publication date

Usage options

  • Collecting a list of group members
  • Collecting the content of all messages in the group

Requests

  • As requests, you need to specify a link to a public group, for example:
https://t.me/a_parser

Results

By default, the username, profile link, and message text are output

(https://t.me/aparser): Чтобы обходить ограничение на 10 запросов с одного IP нужно дополнительно парсить key= с основной страницы
(https://t.me/aparser): Сейчас посмотрю
(https://t.me/aparser): <a href="http://a-parser.com/threads/1795/" target="_blank" rel="noopener">http://a-parser.com/threads/1795/</a>

Output options

Output to CSV

Result format:

[% tools.CSVline(query, user_link, message_date, message_text) %]

Example of the result:

https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:01:09+00:00,"Настройки - Сохранять размер окна"
https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:14:47+00:00,"я запускаю 20 заданий по 300 потоков, с динамическим лимитов в 1200, они выполняются гораздо быстрее за счет того работают все одновременно и нет затыков когда осталось мало запросов(потоков)"
https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:27:06+00:00,"ну прокси в т.ч."

Processing results

A-Parser allows you to process results during parsing

Example of filtering results by word occurrence in a message

img

You need to select "Filter results" and select $message_text from the drop-down list. Select the type RegEx match. In the field for the regular expression, enter the regular expression with the necessary words \bparser\b|\bGoogle\b|\byandex\b|\bparser\b|\bProxy\b|\bDorks\b

\b - word boundary

| - OR

is - regular expression flag

Download example

How to import an example into A-Parser

eJyVVN1v2jAQ/18sHjaJ8qFSacobRaLaxEpX6BOg6oovqVfH9myHgTL+952dkMC6
PewhVu53v/v2uWQe3Jt7sOjQO5asSmbiP0uYR4mZhfyK6+0V7iE3ElmXGbAObeCu
2LKmJMmd1YVZbC0YtETimEIhPeuWzB8MkrdUSB9V5D5oEpajc5Dhs8c9EWvCsmJb
zMFvXwnegSwCsl6/rIvB6DqN5yCco+ocRuSmRtYvv+i70zqTWAsHUBz3tfAfToIw
TFvSNcYTzgw+nVNHLak2gIZElWjjhVZUinDsuNmcOuGm2lKxhHfMsFd3v1EuYIdL
XfUPW3hK0j3koTEdDh6DtpdGRx8+9nxsKHAuQkSQVYQwsjbqkxI/YmOVJi79WoFu
anUeJx8dBPBwym7FOlFm5KKItt8qG5akIB12maNUp0CJ8D81ggYLXtt57ADhJdNq
LOUMdyhbWvR/WwjJ6X6NUzL6XBv+nTJ/5+PYlHceaof2p6UcGi9Rup1/ba24numM
KudhUFLkwpPsJrpQYTADAt8QTdOz+9CzXFtswnhbYBOc1smg4kRsJzY2LXRRxcVU
LsGtVqnI5pS/FRxPzEItaWfnaqLDRoayVCElTcXhY3s7xq6eQhCayt8ZT2KIUPlp
Y5nXWroviypVYwXdvpuQYE6NPI9au9yClE+Ps3MNa28UCa/eG5f0+76XYx+eq/eD
BUOPmaYbRWUdN83D0rxF5b+el6Q80si+u4fKIBQY6IRRp1zcseHxN9yRrow=

Possible settings

ParameterDefault valueDescription
Max empty posts1000This parameter specifies how many consecutive empty (non-existent) messages must be to stop parsing for the current request
Start message number1This parameter specifies from which number to collect messages in the Telegram chat