Cloudflare::Radar - Cloudflare Radar Scraper

Overview of the scraper
The Cloudflare Radar scraper allows you to quickly determine the category of a website by its domain name.
Results can be saved in the format and structure you need, thanks to the built-in powerful templater Template Toolkit which allows applying additional logic to the results and outputting data in various formats, including JSON, SQL and CSV
Data Collected
Data is collected from the radar.cloudflare.com service
- Site categories
Use Cases
- Determining which site category a domain belongs to
Queries
The queries should be a list of domains, for example:
a-parser.com
yandex.ru
google.com
vk.com
facebook.com
youtube.com
Output Results Examples
A-Parser supports flexible result formatting thanks to the built-in templater Template Toolkit, which allows it to output results in an arbitrary form, as well as in structured formats such as CSV or JSON
Default Output
Result format:
$query: $categories.format('$name, ')\n
Example result showing categories and their descriptions:
a-parser.com: Business, Business & Economy,
yandex.ru: News & Media, Entertainment,
vk.com: Social Networks, Society & Lifestyle,
youtube.com: Video Streaming, Entertainment,
facebook.com: Social Networks, Society & Lifestyle,
google.com: Search Engines, Technology,
Output in CSV table
Result format:
[% FOREACH categories;
tools.CSVline(name, desc);
END %]
Example result:
Business,"Sites related to business."
"Business & Economy","Sites that are related to business, economy, finance, education, science and technology."
"Social Networks","Sites that facilitate interaction and networking between people."
"Society & Lifestyle","Sites related to lifestyle that are not included in other categories like fashion, food & drink etc."
"Social Networks","Sites that facilitate interaction and networking between people."
"Society & Lifestyle","Sites related to lifestyle that are not included in other categories like fashion, food & drink etc."
"Search Engines","Sites that allow users to search for content using keywords."
Technology,"Sites related to technology that are not included in the science category."
"News & Media","Sites related to news and media."
Entertainment,"Sites related to entertainment that are not includeded in other categories like Comic books, Audio streaming, Video streaming etc."
Dump Results to JSON
Общий формат результата:
[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;
obj = {};
obj.query = query;
obj.categories = [];
FOREACH item IN p1.categories;
obj.categories.push({
name = item.name
desc = item.desc
});
END;
obj.json %]
Начальный текст:
[
Конечный текст:
]
Example result:
[{"query":"yandex.ru","categories":[{"desc":"Sites related to news and media.","name":"News & Media"},{"desc":"Sites related to entertainment that are not includeded in other categories like Comic books, Audio streaming, Video streaming etc.","name":"Entertainment"}]},{"query":"google.com","categories":[{"desc":"Sites that allow users to search for content using keywords.","name":"Search Engines"},{"desc":"Sites related to technology that are not included in the science category.","name":"Technology"}]},{"query":"a-parser.com","categories":[{"desc":"Sites related to business.","name":"Business"},{"desc":"Sites that are related to business, economy, finance, education, science and technology.","name":"Business & Economy"}]}]
For the "Start Text" and "End Text" options to be available in the Job Editor, you need to activate "More Options".
Available Settings
| Parameter Name | Default Value | Description |
|---|---|---|
| Bypass CloudFlare with Chrome Max Pages | 10 | Max number of pages when bypassing CF via Chrome |
| Bypass CloudFlare with Chrome Headless | ☑ | If enabled, the browser will not be displayed during CF bypass via Chrome |
| Use session | ☑ | Saves good sessions, which allows scraping even faster, getting fewer errors. |