Telegram::GroupScraper - Data scraper for public groups in Telegram
Overview of Telegram Groupscraper data scraper
This scraper collects data about messages from public groups in Telegram. The Telegram group/chat scraper collects participants who have written something in the group or have a service notification about their joining the group. You can parse all the content from the necessary groups, namely: text, images, links to videos, get information about the publication date, author (name, profile link, avatar).
Its logic is different from other scrapers, as it automatically adds requests to iterate through all messages in the group. Because of this, this scraper cannot be used together with any other in one task.
The results can be saved in the format and structure that you need, thanks to the built-in powerful templating engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL and CSV.
Use cases for the Telegram public group scraper
User scraping
Scraping users from public groups in Telegram
Scraping all messages
Scraping all messages from public groups in Telegram
Collected data
- Link to the message
- Author's name, link to their profile and avatar
- Message content, depending on the type it can be:
- text
- link to photo
- link to video
- Message publication date
Usage options
- Collecting a list of group members
- Collecting the content of all messages in the group
Requests
- As requests, you need to specify a link to a public group, for example:
https://t.me/a_parser
Results
By default, the username, profile link, and message text are output
(https://t.me/aparser): Чтобы обходить ограничение на 10 запросов с одного IP нужно дополнительно парсить key= с основной страницы
(https://t.me/aparser): Сейчас посмотрю
(https://t.me/aparser): <a href="http://a-parser.com/threads/1795/" target="_blank" rel="noopener">http://a-parser.com/threads/1795/</a>
Output options
Output to CSV
Result format:
[% tools.CSVline(query, user_link, message_date, message_text) %]
Example of the result:
https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:01:09+00:00,"Настройки - Сохранять размер окна"
https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:14:47+00:00,"я запускаю 20 заданий по 300 потоков, с динамическим лимитов в 1200, они выполняются гораздо быстрее за счет того работают все одновременно и нет затыков когда осталось мало запросов(потоков)"
https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:27:06+00:00,"ну прокси в т.ч."
Processing results
A-Parser allows you to process results during parsing
Example of filtering results by word occurrence in a message
You need to select "Filter results" and select $message_text
from the drop-down list. Select the type RegEx match
.
In the field for the regular expression, enter the regular expression with the necessary words \bparser\b|\bGoogle\b|\byandex\b|\bparser\b|\bProxy\b|\bDorks\b
\b
- word boundary
|
- OR
is
- regular expression flag
Download example
How to import an example into A-Parser
eJyVVN1v2jAQ/18sHjaJ8qFSacobRaLaxEpX6BOg6oovqVfH9myHgTL+952dkMC6
PewhVu53v/v2uWQe3Jt7sOjQO5asSmbiP0uYR4mZhfyK6+0V7iE3ElmXGbAObeCu
2LKmJMmd1YVZbC0YtETimEIhPeuWzB8MkrdUSB9V5D5oEpajc5Dhs8c9EWvCsmJb
zMFvXwnegSwCsl6/rIvB6DqN5yCco+ocRuSmRtYvv+i70zqTWAsHUBz3tfAfToIw
TFvSNcYTzgw+nVNHLak2gIZElWjjhVZUinDsuNmcOuGm2lKxhHfMsFd3v1EuYIdL
XfUPW3hK0j3koTEdDh6DtpdGRx8+9nxsKHAuQkSQVYQwsjbqkxI/YmOVJi79WoFu
anUeJx8dBPBwym7FOlFm5KKItt8qG5akIB12maNUp0CJ8D81ggYLXtt57ADhJdNq
LOUMdyhbWvR/WwjJ6X6NUzL6XBv+nTJ/5+PYlHceaof2p6UcGi9Rup1/ba24numM
KudhUFLkwpPsJrpQYTADAt8QTdOz+9CzXFtswnhbYBOc1smg4kRsJzY2LXRRxcVU
LsGtVqnI5pS/FRxPzEItaWfnaqLDRoayVCElTcXhY3s7xq6eQhCayt8ZT2KIUPlp
Y5nXWroviypVYwXdvpuQYE6NPI9au9yClE+Ps3MNa28UCa/eG5f0+76XYx+eq/eD
BUOPmaYbRWUdN83D0rxF5b+el6Q80si+u4fKIBQY6IRRp1zcseHxN9yRrow=
Possible settings
Parameter | Default value | Description |
---|---|---|
Max empty posts | 1000 | This parameter specifies how many consecutive empty (non-existent) messages must be to stop parsing for the current request |
Start message number | 1 | This parameter specifies from which number to collect messages in the Telegram chat |