SE::Yandex::WordStat - WordStat Scraper. Collection of keywords and impression statistics

Overview of the scraper
Wordstat (Wordstat) is a Yandex service designed to evaluate user interest in various topics and select keywords for SEO optimization and contextual advertising. In addition, with Wordstat Yandex, you can assess the seasonality and geographic dependence of search queries.
The Yandex WordStat keyword scraper supports automatic query multiplication, you can be sure that you will get the maximum number of results from the output. A-Parser can also automatically follow related queries to the specified depth.
A-Parser functionality allows you to save scraping settings for later use (presets), set up a scraping schedule, and much more. You can use automatic query multiplication, substitution of sub-queries from files, iteration of alphanumeric combinations and lists to obtain the maximum possible number of results when scraping Yandex Wordstat.
The saving of results is possible in the form and structure that you need, thanks to the powerful built-in templater Template Toolkit which allows applying additional logic to the results and outputting data in various formats, including JSON, SQL and CSV.
Use cases for the scraper
🔗 Deep Wordstat Scrape
Using the Yandex WordStat scraper for deep scraping.
🔗 Frequency Assessment by WordStat
Frequency Assessment by WordStat
Accounts
Accounts are required for the
SE::Yandex::WordStat scraper to work. Accounts can be registered using the
SE::Yandex::Register scraper or simply by adding existing accounts to the file files/SE-Yandex/accounts.txt in the supported format.
Alternatively, you can enable account registration "on the fly".
For session-based authorization to work, the data string must be in this format:
[email protected];MAQT78Z31Rinx4H;{"answer":"qmfhsxdcrk","proxy":"185.104.120.45:3128","session_id":"3:1748440908.5.0.1748440867459:ZXBxpg:47e4.1.2:1|2191075974.41.2.2:41.3:1748440908|3:10308131.797655.5pfkoRZWgLJGntKTlcUhYdysNfk"}
Collected data
- Number of impressions for the specified query
- Date of statistics update
- List of all keywords related to the specified one and their monthly impressions
- List of all additional keywords searched by users and their monthly impressions

Capabilities
- Scrapes the maximum number of results provided by Wordstat - 40 pages of 50 output elements
- Supports selection of search region (with subgroups)
- Can automatically re-submit found keywords into queries (option Parse to level)
- Ability to select multiple regions for evaluation
- Supports automatic bypass of Smart Captcha and the ability to bypass graphic captcha using the AntiCaptcha service or any other service supporting their API
- Choice of device type
- Ability to choose the authorization method
- Ability to register accounts "on the fly"
- Supports work with a rich account format and knows how to answer the secret question (if the answer is in
info). And also uses the saved proxy for authorization (if it is ininfo).
Use Cases
- Traffic volume estimation for a keyword (frequency)
- Searching for new similar-topic keywords
- Gathering large databases of keywords on various topics
- Any other options involving Yandex.WordStat scraping in one form or another
Queries
As queries, you must specify keywords, exactly as if they were entered directly into the Wordstat search form, for example:
windows moscow
"windows moscow"
!windows !moscow
Result Output Options
A-Parser supports flexible result formatting thanks to the built-in templater Template Toolkit, which allows it to output results in an arbitrary form, as well as in a structured one, such as CSV or JSON
Default Output
Result format:
$query - $totalcount, updated: $updatedate\nkeywords:\n$keys.format('$key: $count\n')\nadditional keywords:\n$search.format('$key: $count\n')
The result displays the original query, its number of impressions, the statistics update date, a list of related keywords and their monthly impressions, and a list of additional keywords and their monthly impressions:
!windows !moscow - 10368, updated: 16/05/2013
keywords:
windows moscow: 32367
plastic windows moscow: 8994
pvc windows moscow: 4813
buy windows moscow: 2561
windows prices moscow: 1706
moscow job windows: 1547
windows vacancies moscow: 1187
wooden windows moscow: 1087
single window service moscow: 1021
...
additional keywords:
pvc window production: 8512
rehau windows: 15686
salamander windows: 1576
kbe windows: 3798
kbe windows: 6089
kwe windows: 3227
balcony glazing: 83216
gazebos: 471213
loggia glazing: 26366
office partitions: 18740
window installation: 26223
Output to CSV table
Result format:
[% FOREACH i IN keys;
tools.CSVline(query, i. key, i.count);
END %]
Example result:
website scraper, website scraper, 8055
website scraper, free website scraper, 1122
website scraper, official website scraper, 666
website scraper, cloud scraper websites, 507
website scraper, email scraper +from website, 477
website scraper, download website scraper, 434
website scraper, website address scraper, 390
website scraper, online website scraper, 366
website scraper, turbo website scraper, 342
website scraper, turbo scraper official website, 309
website scraper, cloud scraper official website, 308
website scraper, excel website scraper, 276
website scraper, sliza scraper website, 259
Saving in SQL format
Result format:
[% FOREACH i IN keys;
"INSERT INTO keys VALUES('" _ query _ "', '"; i.key _ "', '"; i.count _ "')\n";
END %]
Example result:
INSERT INTO serp VALUES('test', 'test drive', '1164338')
INSERT INTO serp VALUES('test', 'test
INSERT INTO serp VALUES('test', 'online tests', '792560')+for testing', '879980')
INSERT INTO serp VALUES('test', 'test drive video', '550164')
INSERT INTO serp VALUES('test', 'test recipe', '484489')
INSERT INTO serp VALUES('test', 'tests
INSERT INTO serp VALUES('test', 'test 2014', '427602')+with answers', '449401')
INSERT INTO serp VALUES('test', 'free tests', '315144')
INSERT INTO serp VALUES('test', 'free tests', '315096')
INSERT INTO serp VALUES('test', 'tests
INSERT INTO serp VALUES('test', 'tests +by topics', '293917')
INSERT INTO serp VALUES('test', 'game tests', '288989')+by topics', '293917')
INSERT INTO serp VALUES('test', 'test', '10837937')
Dump results to JSON
Общий формат результата:
[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;
obj = {};
obj.updatedate = p1.updatedate;
obj.totalcount = p1.totalcount;
obj.keys = [];
FOREACH item IN p1.keys;
obj.keys.push({
key = item.key
count = item.count
});
END;
obj.json %]
Начальный текст:
[
Конечный текст:
]
Example result:
[{
"updatedate": "12.03.2014",
"totalcount": "10837937",
"keys": [
{
"count": "10837937",
"key": "test"
},
{
"count": "1164338",
"key": "test drive"
},
{
"count": "879980",
"key": "test +for test"
},
{
"count": "792560",
"key": "online tests"
},
]
}]
See also: Filters results
Possible settings
| Parameter | Default Value | Description |
|---|---|---|
| Pages count | 10 | Number of pages for scraping |
| Region | All | Search region |
| Remove + from keywords | ☐ | Remove plus symbol (+) from found queries |
| AntiGate preset | default | You must first configure the Util::AntiGate scraper - specify your access key and other parameters, then select the created preset here |
| AntiGate preset for Login | default | AntiGate preset for login. You must first configure the Util::AntiGate scraper with parameters, then select the created preset here |
| Type | All | Choice of device type |
| Accounts | Only from "accounts.txt" | Choice of account operation method: Always auto register - always automatically register accounts "on the fly", requires selecting a configured preset in the SE::Yandex::Register preset parameter. Auto register if no more in "accounts.txt" - first use existing accounts from accounts.txt, and if they run out - automatic registration "on the fly" is used, for which a configured preset in the SE::Yandex::Register preset parameter must be selected. Only from "accounts.txt" - only use existing accounts from accounts.txt, and if they run out - wait for a specified time (Wait new accounts in "accounts.txt" parameter) for new ones to appear. Only by session_id from "accounts.txt" - authorization by cookies. |
| Wait new accounts in "accounts.txt" | 0 | Wait time for new accounts to appear in accounts.txt |
| Remove bad accounts | Always, except wrong login/password | Automatic deletion of "bad" accounts: Always - always delete. Always, except wrong login/password - always delete, except when Yandex reported incorrect login/password. This is because Yandex can give such a message when the IP is banned for an absolutely working account, so optionally you can leave such accounts for reuse. Never - never delete. Regardless of the selected option, accounts are not deleted in case of proxy/browser errors |
| SE::Yandex::Register preset | default | Select settings preset for SE::Yandex::Register |
| Authorization method | HTTP | Authorization method: HTTP - fast, not resource-intensive. Chrome - slow, resource-intensive, theoretically can extend account life |
| Chrome headless | ☑ | If the option is enabled, the browser will not be displayed |
| Use sessions | ☑ | Use sessions |
| Do not reset session if authorization passed | ☑ | Do not reset the session on errors if the scraper is already authorized |
| Use Wordstat 2 | ☐ | Use Wordstat 2 |
| Wordstat 2 parse all table data | ☑ | Allows exporting all 2000 results for a query immediately without going through pagination |

