Telegram::GroupScraper - Scraper of data from public groups in Telegram
Scraper overview
This scraper collects data about messages from public groups in Telegram. The Telegram group/chat scraper performs the collection of members who have written something in the group, or there is a service notification about their joining the group. You can scrape all content from the groups you need, namely: text, images, links to videos, get information about the publication date, author (name, profile link, avatar).
Its logic of operation is different from other scrapers because it automatically adds requests to iterate through all messages in the group. Because of this, this scraper cannot be used with any others in the same task.
Saving results is possible in the form and structure you need, thanks to the built-in powerful templating engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.
Use cases for the scraper
🔗 User scraping
Scraping users from public groups in Telegram
🔗 Scraping all messages
Scraping all messages from public groups in Telegram
Collected data
- Link to the message
- Author's name, link to their profile, and avatar
- Message content, depending on the type it can be:
- text
- link to a photo
- link to a video
- Message publication date
Usage scenarios
- Collecting a list of group members
- Collecting the content of all messages in the group
Queries
As queries, you need to specify a link to a public group, for example:
https://t.me/a_parser
Output options for results
A-Parser supports flexible formatting of results thanks to the built-in templating engine Template Toolkit, which allows it to output results in any form, as well as in structured formats, such as CSV or JSON
Default output
Result format:
$user_name($user_link): $message_text\n
Example of result:
(https://t.me/aparser): Чтобы обходить ограничение на 10 запросов с одного IP нужно дополнительно парсить key= с основной страницы
(https://t.me/aparser): Сейчас посмотрю
(https://t.me/aparser): <a href="http://a-parser.com/threads/1795/" target="_blank" rel="noopener">http://a-parser.com/threads/1795/</a>
Output in CSV table
Result format:
[% tools.CSVline(query, user_link, message_date, message_text) %]
Example of result:
https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:01:09+00:00,"Настройки - Сохранять размер окна"
https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:14:47+00:00,"я запускаю 20 заданий по 300 потоков, с динамическим лимитов в 1200, они выполняются гораздо быстрее за счет того работают все одновременно и нет затыков когда осталось мало запросов(потоков)"
https://t.me/a_parser,https://t.me/Forby403,2016-11-05T05:27:06+00:00,"ну прокси в т.ч."
Results processing
A-Parser allows you to process results directly during scraping, in this section we have provided the most popular use cases for the Telegram scraper
Filtering results by the occurrence of words in the message
You need to add a filter and select from the dropdown list $message_text - Message text
. Choose the type Regex matches
.
In the field for regex, write the regex with the necessary words:
\bпарсер\b|\bGoogle\b|\byandex\b|\bпарсер\b|\bПрокси\b|\bДорки\b
\b
- word boundary
|
- OR
is
- regex flag
Download example
How to import an example into A-Parser
eJyVVN1v2jAQ/18sHjaJ8qFSacobRaLaxEpX6BOg6oovqVfH9myHgTL+952dkMC6
PewhVu53v/v2uWQe3Jt7sOjQO5asSmbiP0uYR4mZhfyK6+0V7iE3ElmXGbAObeCu
2LKmJMmd1YVZbC0YtETimEIhPeuWzB8MkrdUSB9V5D5oEpajc5Dhs8c9EWvCsmJb
zMFvXwnegSwCsl6/rIvB6DqN5yCco+ocRuSmRtYvv+i70zqTWAsHUBz3tfAfToIw
TFvSNcYTzgw+nVNHLak2gIZElWjjhVZUinDsuNmcOuGm2lKxhHfMsFd3v1EuYIdL
XfUPW3hK0j3koTEdDh6DtpdGRx8+9nxsKHAuQkSQVYQwsjbqkxI/YmOVJi79WoFu
anUeJx8dBPBwym7FOlFm5KKItt8qG5akIB12maNUp0CJ8D81ggYLXtt57ADhJdNq
LOUMdyhbWvR/WwjJ6X6NUzL6XBv+nTJ/5+PYlHceaof2p6UcGi9Rup1/ba24numM
KudhUFLkwpPsJrpQYTADAt8QTdOz+9CzXFtswnhbYBOc1smg4kRsJzY2LXRRxcVU
LsGtVqnI5pS/FRxPzEItaWfnaqLDRoayVCElTcXhY3s7xq6eQhCayt8ZT2KIUPlp
Y5nXWroviypVYwXdvpuQYE6NPI9au9yClE+Ps3MNa28UCa/eG5f0+76XYx+eq/eD
BUOPmaYbRWUdN83D0rxF5b+el6Q80si+u4fKIBQY6IRRp1zcseHxN9yRrow=
Possible settings
Parameter | Default Value | Description |
---|---|---|
Max empty posts | 1000 | This parameter specifies how many consecutive empty (non-existent) messages should be there for the scraper to stop parsing the current request |
Start message number | 1 | This parameter specifies from which number to collect messages in the Telegram chat |