SE::Yandex::SQI - Site Quality Index Check in Yandex
Scraper Overview
SE::Yandex::SQI – site quality index check in Yandex. Incredibly fast scraper, with a speed of 3000-7000 requests per minute.You can use automatic query multiplication, substitution of subqueries from files, enumeration of alphanumeric combinations and lists to get the maximum possible number of results. By using result filtering you can immediately clean up the result by removing all unnecessary garbage (using minus-words).
The A-Parser functionality allows you to save the settings of the SE::Yandex::SQI scraper for further use (presets), set a parsing schedule, and much more.
Results can be saved in the form and structure you need, thanks to the built-in powerful template engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.
Collected Data
- Site Quality Index (Yandex SQI)
- Data on the presence of badges on the site (1 - badge received, 0 - no badge):
- User's Choice
- Popular site
- Secure connection
- Turbo pages
- Is the site official
- For the "User's Choice" and "Popular site" badges, it is possible to obtain the readiness level to receive the badge as an intermediate value from 0 to 1, for example 0.4.
- Number of reviews, rating, and score
- Store rating in product search and store rating on Yandex Market (if this data is available for the searched site)
Use Cases
- Evaluation of site usefulness from Yandex's point of view
- Gathering titles
Queries
As queries, it is necessary to specify the domain of the searched site. You can specify both with and without the protocol, for example:
yandex.ru
google.com
vk.com
facebook.com
https://a-parser.com
Output Results Examples
A-Parser supports flexible result formatting thanks to the built-in template engine Template Toolkit, which allows it to output results in an arbitrary form, as well as in a structured form, for example CSV or JSON.
Default Output
Result format:
$query: $sqi\n
Example of a result that shows the initial query and its SQI:
facebook.com: 130000
yandex.ru: -1
https://a-parser.com: 110
google.com: 120000
vk.com: 340000
If the SQI for the domain is unavailable, the result will be -1
.
Output in CSV Table
Result format:
[% tools.CSVline(query, sqi, rating); %]
File name:
$datefile.format().csv
Initial text:
Домен,Рейтинг,Автор,Цена
To make the "Initial text" option available in the Task Editor, you need to activate "More options". In "Initial text" we write the column names separated by commas and make the second line empty.
Saving in SQL Format
Result format:
[% "INSERT INTO sqi VALUES('" _ query _ "', '" _ sqi _ "', '" _ rating _ "')\n" %]
Example of a result:
INSERT INTO sqi VALUES('google.com', '122000', '87')
INSERT INTO sqi VALUES('yandex.ru', 'none', '92')
INSERT INTO sqi VALUES('https://a-parser.com', '200', '')
INSERT INTO sqi VALUES('vk.com', '326000', '73')
INSERT INTO sqi VALUES('facebook.com', '117000', '66')
Dump Results to JSON
Общий формат результата:
[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;
obj = {};
obj.query = query;
obj.sqi = p1.sqi;
obj.rating = p1.rating;
obj.json %]
Начальный текст:
[
Конечный текст:
]
Example of a result:
[{"query":"vk.com","rating":73,"sqi":326000},
{"query":"google.com","rating":87,"sqi":122000},
{"query":"https://a-parser.com","rating":"","sqi":200},
{"query":"yandex.ru","rating":92,"sqi":"none"},
{"query":"facebook.com","rating":66,"sqi":117000}]
To make the "Initial text" and "Final text" options available in the Task Editor, you need to activate "More options".
Possible Settings
Parameter | Default value | Description |
---|---|---|
AntiGate preset | default | Selection of preset for Util::AntiGate, more details on setting here |
AntiGate preset for old captcha | default | Similar to AntiGate preset, but used only for regular (old, single-image) captchas. If no preset is selected here, the preset chosen in AntiGate preset will be used for such captchas. |
Auto-Solve ClickCaptcha | ☐ | Automatic solving of click captchas (without using services) |
Experimental img captcha max count | 1 | Maximum number of repeated captcha images per attempt |
Use sessions | ☑ | Saves good sessions, allowing for even faster scraping with fewer errors |