Skip to main content

SE::Yandex::WordStat - WordStat Scraper. Collecting Keywords and Impressions Statistics

WordStat Scraper Overview

img

Wordstat is a Yandex service designed to evaluate user interest in various topics and select keywords for SEO optimization and contextual advertising. In addition, with Wordstat Yandex, you can evaluate the seasonality and geographic dependence of search queries.

The Yandex WordStat keyword scraper supports automatic query multiplication, so you can be sure that you will get the maximum number of results from the output. Also, A-Parser can automatically navigate to related queries to the specified depth.

The A-Parser functionality allows you to save parsing settings for further use (presets), set a parsing schedule, and much more. You can use automatic query multiplication, substitution of subqueries from files, enumeration of alphanumeric combinations and lists to get the maximum possible number of results when parsing Yandex WordStat.

Saving results is possible in the form and structure that you need, thanks to the built-in powerful Template Toolkit template engine, which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.

Wordstat Scraper Use Cases

Parsing Wordstat in Depth

Using the Yandex WordStat scraper for deep parsing.

Frequency Estimation by WordStat

Frequency estimation by WordStat

Accounts

To work with the SE::Yandex::WordStatSE::Yandex::WordStat scraper, Yandex accounts are required. Accounts can be registered using the SE::Yandex::RegisterSE::Yandex::Register scraper or simply added existing accounts to the files/SE-Yandex/accounts.txt file in the supported format.

Alternatively, you can enable "on-the-fly" account registration.

Collected Data

  • The number of impressions for the specified query
  • The date of the statistics update
  • A list of all keywords related to the specified one and their monthly impressions
  • A list of all additional keywords that users searched for and their monthly impressions

Data collected by the SE::Yandex::WordStat scraper

Capabilities

  • Parses the maximum number of results returned by Wordstat - 40 pages of 50 search results each
  • Supports selecting a search region (with subgroups)
  • Can automatically substitute found keywords into queries again (the Parse to level option)
  • Ability to select several regions for evaluation at once
  • Supports automatic bypass of Smart captcha and the ability to bypass graphic captcha using the AntiCaptcha service or any other API supporting them
  • Choice of device type
  • Ability to choose the authentication method
  • Ability to register accounts "on the fly"
  • Supports working with advanced account format and can answer the secret question (if the answer is in info). Also uses saved proxy for authorization (if it is in info).

Usage Scenarios

  • Estimating the amount of traffic for a keyword (frequency)
  • Finding new keywords of a similar theme
  • Collecting large databases of keywords of various themes
  • Any other scenarios involving parsing Yandex.WordStat in one way or another

Queries

  • As queries, you need to specify keywords, just as if you were entering them directly into the Wordstat search form, for example:
окна москва  
"окна москва"
!окна !москва

Results

  • As a result, the original query, the number of its impressions, the date of the statistics update, a list of related keywords and their monthly impressions, and a list of additional keywords and their monthly impressions are displayed:
!окна !москва - 10368, updated: 16/05/2013  
keywords:
окна москва: 32367
пластиковые окна москва: 8994
окна пвх москва: 4813
купить окна москва: 2561
окна цены москва: 1706
москва работа окна: 1547
вакансии окна москва: 1187
деревянные окна москва: 1087
служба +одного окна москва: 1021
...
additional keywords:
производство окон пвх: 8512
окна rehau: 15686
окна salamander: 1576
окна kbe: 3798
окна кбе: 6089
окна кве: 3227
остекление балконов: 83216
беседки: 471213
остекление лоджий: 26366
офисные перегородки: 18740
монтаж окон: 26223

Result Output Options

A-Parser supports flexible result formatting thanks to the built-in Template Toolkit template engine, which allows it to output results in any form, as well as in a structured form, such as CSV or JSON.

Outputting the Result in JSON

Result format:

[% data = {};  
data.links = [];
data.updatedate = updatedate;
data.totalcount = totalcount;

FOREACH i IN keys;
item = {};
item.key = i.key;
item.count = i.count;
data.links.push(item);
END;

data.json %]

Result example:

{
"updatedate": "12.03.2014",
"totalcount": "10837937",
"links": [
{
"count": "10837937",
"key": "тест"
},
{
"count": "1164338",
"key": "тест драйв"
},
{
"count": "879980",
"key": "тесто +для теста"
},
{
"count": "792560",
"key": "тесты онлайн"
},
]
}

Outputting the result to CSV

Result format:

[% FOREACH i IN keys;
tools.CSVline(query, i. key, i.count);
END %]

Example of the result:

парсер сайтов,  парсер сайтов, 8055
парсер сайтов, бесплатный парсер сайтов, 1122
парсер сайтов, парсер официальный сайт, 666
парсер сайтов, сайты облачный парсер, 507
парсер сайтов, парсер email +с сайта, 477
парсер сайтов, парсер сайта скачать, 434
парсер сайтов, парсер адресов сайтов, 390
парсер сайтов, парсер сайтов онлайн, 366
парсер сайтов, турбо парсер сайтов, 342
парсер сайтов, турбо парсер официальный сайт, 309
парсер сайтов, облачный парсер официальный сайт, 308
парсер сайтов, парсер сайтов excel, 276
парсер сайтов, слиза парсер сайт, 259

Dumping the result to SQL

Result format:

[% FOREACH i IN keys;
"INSERT INTO keys VALUES('" _ query _ "', '"; i.key _ "', '"; i.count _ "')\n";
END %]

Example of the result:

INSERT INTO serp VALUES('тест', 'тест', '10837937')
INSERT INTO serp VALUES('тест', 'тест драйв', '1164338')
INSERT INTO serp VALUES('тест', 'тесто +для теста', '879980')
INSERT INTO serp VALUES('тест', 'тесты онлайн', '792560')
INSERT INTO serp VALUES('тест', 'тест драйв видео', '550164')
INSERT INTO serp VALUES('тест', 'рецепт теста', '484489')
INSERT INTO serp VALUES('тест', 'тесты +с ответами', '449401')
INSERT INTO serp VALUES('тест', 'тест 2014', '427602')
INSERT INTO serp VALUES('тест', 'тесты бесплатно', '315144')
INSERT INTO serp VALUES('тест', 'бесплатные тесты', '315096')
INSERT INTO serp VALUES('тест', 'тесты +для девочек', '309355')
INSERT INTO serp VALUES('тест', 'тесты +по темам', '293917')
INSERT INTO serp VALUES('тест', 'игры тесты', '288989')
tip

See also: Result filters

Possible settings

ParameterDefault valueDescription
Pages count10Number of pages to parse
RegionAllSearch region
Remove + from keywordsRemove the plus sign (+) from the found queries
AntiGate presetdefaultYou need to configure the Util::AntiGateUtil::AntiGate parser in advance - specify your access key and other parameters, and then select the created preset here
AntiGate preset for LogindefaultAntiGate preset for login. You need to configure the Util::AntiGateUtil::AntiGate parser with parameters, and then select the created preset here
TypeAllDevice type selection
AccountsOnly from "accounts.txt"Account operation method selection: Always auto register - always automatically register accounts "on the fly", you need to select the configured preset in the SE::Yandex::Register preset parameter. Auto register if no more in "accounts.txt" - first use existing accounts from accounts.txt, and if they run out - use automatic registration "on the fly", for which you need to select the configured preset in the SE::Yandex::Register preset parameter. Only from "accounts.txt" - use only existing accounts from accounts.txt, and if they run out - wait for the specified time (Wait new accounts in "accounts.txt" parameter) for new ones to appear
Wait new accounts in "accounts.txt"0Waiting time for new accounts to appear in accounts.txt
Remove bad accountsAlways, except wrong login/passwordAutomatic removal of "bad" accounts: Always - always delete. Always, except wrong login/password - always delete, except when Yandex reports that the login/password is incorrect. The fact is that Yandex can give such a message for an absolutely working account with a banned IP, so optionally such accounts can be left for reuse. Never - never delete. Regardless of the selected option, accounts are not deleted in case of proxy/browser errors
SE::Yandex::Register presetdefaultSelection of settings preset for SE::Yandex::RegisterSE::Yandex::Register
Authorization methodHTTPAuthorization method: HTTP - fast, not resource-intensive. Chrome - slow, resource-intensive, theoretically can extend the life of accounts
Chrome headlessIf the option is enabled, the browser will not be displayed
Use sessionsSession usage
Do not reset session if authorization passedDo not reset the session in case of errors if the parser has already been authorized