Skip to main content

Cloudflare::Radar - Cloudflare Radar Scraper

img

Overview of the scraper

The Cloudflare Radar scraper allows you to quickly determine the category of a website by its domain name.

Results can be saved in the format and structure you need, thanks to the built-in powerful templater Template Toolkit which allows applying additional logic to the results and outputting data in various formats, including JSON, SQL and CSV

Data Collected

Data is collected from the radar.cloudflare.com service

  • Site categories

Use Cases

  • Determining which site category a domain belongs to

Queries

The queries should be a list of domains, for example:

a-parser.com  
yandex.ru
google.com
vk.com
facebook.com
youtube.com

Output Results Examples

A-Parser supports flexible result formatting thanks to the built-in templater Template Toolkit, which allows it to output results in an arbitrary form, as well as in structured formats such as CSV or JSON

Default Output

Result format:

$query: $categories.format('$name, ')\n

Example result showing categories and their descriptions:

a-parser.com: Business, Business & Economy, 
yandex.ru: News & Media, Entertainment,
vk.com: Social Networks, Society & Lifestyle,
youtube.com: Video Streaming, Entertainment,
facebook.com: Social Networks, Society & Lifestyle,
google.com: Search Engines, Technology,

Output in CSV table

Result format:

[% FOREACH categories;
tools.CSVline(name, desc);
END %]

Example result:

Business,"Sites related to business."
"Business & Economy","Sites that are related to business, economy, finance, education, science and technology."
"Social Networks","Sites that facilitate interaction and networking between people."
"Society & Lifestyle","Sites related to lifestyle that are not included in other categories like fashion, food & drink etc."
"Social Networks","Sites that facilitate interaction and networking between people."
"Society & Lifestyle","Sites related to lifestyle that are not included in other categories like fashion, food & drink etc."
"Search Engines","Sites that allow users to search for content using keywords."
Technology,"Sites related to technology that are not included in the science category."
"News & Media","Sites related to news and media."
Entertainment,"Sites related to entertainment that are not includeded in other categories like Comic books, Audio streaming, Video streaming etc."

Dump Results to JSON

Общий формат результата:

[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;

obj = {};
obj.query = query;
obj.categories = [];

FOREACH item IN p1.categories;
obj.categories.push({
name = item.name
desc = item.desc
});
END;

obj.json %]

Начальный текст:

[

Конечный текст:

]

Example result:

[{"query":"yandex.ru","categories":[{"desc":"Sites related to news and media.","name":"News & Media"},{"desc":"Sites related to entertainment that are not includeded in other categories like Comic books, Audio streaming, Video streaming etc.","name":"Entertainment"}]},{"query":"google.com","categories":[{"desc":"Sites that allow users to search for content using keywords.","name":"Search Engines"},{"desc":"Sites related to technology that are not included in the science category.","name":"Technology"}]},{"query":"a-parser.com","categories":[{"desc":"Sites related to business.","name":"Business"},{"desc":"Sites that are related to business, economy, finance, education, science and technology.","name":"Business & Economy"}]}]
tip

For the "Start Text" and "End Text" options to be available in the Job Editor, you need to activate "More Options".

Available Settings

Parameter NameDefault ValueDescription
Bypass CloudFlare with Chrome Max Pages10Max number of pages when bypassing CF via Chrome
Bypass CloudFlare with Chrome HeadlessIf enabled, the browser will not be displayed during CF bypass via Chrome
Use sessionSaves good sessions, which allows scraping even faster, getting fewer errors.