Skip to main content

SE::Yandex::SQI - Checking the Site Quality Index in Yandex

Overview of the scraper

Overview of the scraperSE::Yandex::SQISE::Yandex::SQI – checking the site quality index in Yandex. Incredibly fast scraper, working speed 3000-7000 requests per minute.

You can use automatic query multiplication, substitution of sub-queries from files, iteration of alphanumeric combinations and lists to get the maximum possible number of results. Using result filtering you can immediately clean the result, removing all unnecessary rubbish (using stop words).

A-Parser's functionality allows you to save the parsing settings of the SE::Yandex::SQI scraper for future use (presets), setting a parsing schedule, and much more.

Saving results is possible in the format and structure you need, thanks to the powerful built-in templating engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL and CSV.

Collected data

  • Site Quality Index (Yandex SQI)
  • Data on the presence of badges at the site (1 - badge obtained, 0 - no badge):
    • Users' Choice
    • Popular Site
    • Secure Connection
    • Turbo Pages
    • Whether the site is official
  • For the badges "Users' Choice" and "Popular Site", you can get the degree of readiness to receive the badge as an intermediate value from 0 to 1, for example 0.4.
  • Number of reviews, rating, and score
  • Store rating in product search and store rating on Yandex Market (if this data is available for the searched site)

Use cases

  • Assessing site usefulness from Yandex's perspective
  • Collecting titles

Queries

The domain of the searched site must be specified as queries. You can specify it with or without the protocol, for example:

yandex.ru 
google.com
vk.com
facebook.com
https://a-parser.com

Output results examples

A-Parser supports flexible result formatting thanks to the powerful built-in templating engine Template Toolkit, which allows it to output results in an arbitrary form, as well as in a structured one, such as CSV or JSON

Default output

Result format:

$query: $sqi\n

Example of a result showing the initial query and its SQI:

facebook.com: 130000  
yandex.ru: -1
https://a-parser.com: 110
google.com: 120000
vk.com: 340000

If the SQI for the domain is unavailable, the result will be -1.

Output to CSV table

Result format:

[% tools.CSVline(query, sqi, rating); %]

File name:

$datefile.format().csv

Initial text:

Domain,Rating,Author,Price

tip

For the "Initial text" option to be available in the Task Editor, you need to activate "More options". In "Initial text", write the column names separated by commas and leave the second line empty.

Saving in SQL format

Result format:

[% "INSERT INTO sqi VALUES('" _ query _ "', '" _ sqi _ "', '" _ rating _ "')\n" %]

Example result:

INSERT INTO sqi VALUES('google.com', '122000', '87')
INSERT INTO sqi VALUES('yandex.ru', 'none', '92')
INSERT INTO sqi VALUES('https://a-parser.com', '200', '')
INSERT INTO sqi VALUES('vk.com', '326000', '73')
INSERT INTO sqi VALUES('facebook.com', '117000', '66')

Dump results to JSON

Общий формат результата:

[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;

obj = {};
obj.query = query;
obj.sqi = p1.sqi;
obj.rating = p1.rating;

obj.json %]

Начальный текст:

[

Конечный текст:

]

Example result:

[{"query":"vk.com","rating":73,"sqi":326000},
{"query":"google.com","rating":87,"sqi":122000},
{"query":"https://a-parser.com","rating":"","sqi":200},
{"query":"yandex.ru","rating":92,"sqi":"none"},
{"query":"facebook.com","rating":66,"sqi":117000}]
tip

For the "Initial text" and "Final text" options to be available in the Task Editor, you need to activate "More options".

Possible settings

ParameterDefault valueDescription
AntiGate presetdefaultSelecting a preset Util::AntiGateUtil::AntiGate, more details on configuration here
AntiGate preset for old captchadefaultSame as AntiGate preset, but used only for regular (old, single image) captchas. If a preset is not selected here, the preset selected in AntiGate preset will be used for such captchas.
Experimental img captcha max count5Maximum number of repeated captcha images per attempt
Preffered captcha typeClickChoosing the preferred captcha type: Click or Puzzle
Use sessionsSaves good sessions, allowing scraping even faster with fewer errors