Cloudflare::Radar - Cloudflare Radar scraper
Scraper Overview
The Cloudflare Radar scraper allows you to quickly determine the category of a website by its domain name.
Saving results is possible in the form and structure you need, thanks to the built-in powerful templating engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV
Collected Data
Data is collected from the service radar.cloudflare.com
- Website categories
Use Cases
- Determining to which category a domain's websites belong
Queries
As queries, you need to specify a list of domains, for example:
a-parser.com
yandex.ru
google.com
vk.com
facebook.com
youtube.com
Output Results Examples
A-Parser supports flexible formatting of results thanks to the built-in templating engine Template Toolkit, which allows it to output results in any form, as well as in structured formats, such as CSV or JSON
Default Output
Result format:
$query: $categories.format('$name, ')\n
An example of a result that displays categories and their descriptions:
a-parser.com: Business, Business & Economy,
yandex.ru: News & Media, Entertainment,
vk.com: Social Networks, Society & Lifestyle,
youtube.com: Video Streaming, Entertainment,
facebook.com: Social Networks, Society & Lifestyle,
google.com: Search Engines, Technology,
Output in CSV Table
Result format:
[% FOREACH categories;
tools.CSVline(name, desc);
END %]
Example result:
Business,"Sites related to business."
"Business & Economy","Sites that are related to business, economy, finance, education, science and technology."
"Social Networks","Sites that facilitate interaction and networking between people."
"Society & Lifestyle","Sites related to lifestyle that are not included in other categories like fashion, food & drink etc."
"Social Networks","Sites that facilitate interaction and networking between people."
"Society & Lifestyle","Sites related to lifestyle that are not included in other categories like fashion, food & drink etc."
"Search Engines","Sites that allow users to search for content using keywords."
Technology,"Sites related to technology that are not included in the science category."
"News & Media","Sites related to news and media."
Entertainment,"Sites related to entertainment that are not includeded in other categories like Comic books, Audio streaming, Video streaming etc."
Dump Results to JSON
Общий формат результата:
[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;
obj = {};
obj.query = query;
obj.categories = [];
FOREACH item IN p1.categories;
obj.categories.push({
name = item.name
desc = item.desc
});
END;
obj.json %]
Начальный текст:
[
Конечный текст:
]
Example result:
[{"query":"yandex.ru","categories":[{"desc":"Sites related to news and media.","name":"News & Media"},{"desc":"Sites related to entertainment that are not includeded in other categories like Comic books, Audio streaming, Video streaming etc.","name":"Entertainment"}]},{"query":"google.com","categories":[{"desc":"Sites that allow users to search for content using keywords.","name":"Search Engines"},{"desc":"Sites related to technology that are not included in the science category.","name":"Technology"}]},{"query":"a-parser.com","categories":[{"desc":"Sites related to business.","name":"Business"},{"desc":"Sites that are related to business, economy, finance, education, science and technology.","name":"Business & Economy"}]}]
To make the "Start Text" and "End Text" options available in the Task Editor, you need to activate "More options".
Possible Settings
Parameter Name | Default Value | Description |
---|---|---|
Bypass CloudFlare with Chrome Max Pages | 10 | Max. number of pages when bypassing CF via Chrome |
Bypass CloudFlare with Chrome Headless | ☑ | If the option is enabled, the browser will not be displayed during CF bypass via Chrome |
Use session | ☑ | Saves good sessions, allowing for even faster scraping with fewer errors. |