SE::Yandex::Speller - Checking pages for text errors via Yandex.Speller
Overview of the scraper

SE::Yandex::Speller – finds spelling errors in Russian, Ukrainian, or English text on the specified page via the Yandex.Speller service. Language models include hundreds of millions of words and phrases.A-Parser functionality allows saving the parsing settings for the SE::Yandex::Speller scraper for future use (presets), setting a scraping schedule, and much more.
Saving results is possible in the form and structure that you need, thanks to the built-in powerful templating engine Template Toolkit which allows applying additional logic to the results and outputting data in various formats, including JSON, SQL and CSV.
Collected data
- Text blocks in which errors are found
Capabilities
- Determining the number of blocks with errors
- Displaying possible reasons for errors in the text
Use cases
- Searching for the number of text blocks with errors
- Checking website pages for spelling errors in the text
- Checking spelling on website pages
Queries
The scraper can accept both keywords (text strings) and links to pages as input. The query type is determined automatically.
- Example queries in the form of text strings:
Text to be checked by the Yandex Speller scraper
Query with a typo
- Example queries in the form of a website page address to be checked:
https://a-parser.com/
https://en.wikipedia.org/wiki/Parsing
Output results examples
A-Parser supports flexible result formatting thanks to the built-in templating engine Template Toolkit, which allows it to output results in an arbitrary form, as well as in a structured format, for example, CSV or JSON
Default output
Result format:
$query: $total\n$errors.format('$word ($suggest) - $type\n')
Example result:
Query with a typo: 1
typo (error, sheathing) - Word not in dictionary.
Text to be checked by the Yandex Speller scraper: 0
https://a-parser.com/: 10
suggestion (suggestions) - Word not in dictionary.
data (data, data) - Word not in dictionary.
MOZ (DMOZ) - Word not in dictionary.
NodeJS (Node JS) - Word not in dictionary.
Develop (Develop) - Word not in dictionary.
...
https://en.wikipedia.org/wiki/Parsing: 183
• العربية (• العربية) - Text contains too many errors.
• বাংলা (• বাংলা) - Text contains too many errors.
...
material (material) - Word not in dictionary.
parsed (passed) - Word not in dictionary.
they (that) - Word not in dictionary.
...
Saving in SQL format
Result format:
[% FOREACH errors;
"INSERT INTO errors VALUES('" _ word _ "', '" _ suggest _ "', '" _ type _ "')\n";
END %]
Example result:
INSERT INTO errors VALUES('SaaS', 'Seas', 'Word not in dictionary.')
INSERT INTO errors VALUES('freelancers', '', 'Word is not in the dictionary.')
INSERT INTO errors VALUES('Arbitrazhniki', 'Arbitrazh niki', 'Word is not in the dictionary.')
INSERT INTO errors VALUES('Youtube', 'YouTube', 'Incorrect use of upper and lower case letters.')
INSERT INTO errors VALUES('emails', 'mails', 'Word is not in the dictionary.')
INSERT INTO errors VALUES('WordStat', '', 'Word not in dictionary.')
INSERT INTO errors VALUES('Linkbuilding', '', 'Word is not in the dictionary.')
INSERT INTO errors VALUES('outreach', '', 'Word is not in the dictionary.')
INSERT INTO errors VALUES('Alexa', '', 'Word not in dictionary.')
INSERT INTO errors VALUES('SEMRush', '', 'Word not in dictionary.')
INSERT INTO errors VALUES('Ahrefs', 'Href', 'Word not in dictionary.')
INSERT INTO errors VALUES('MajesticSEO', '', 'Word not in dictionary.')
INSERT INTO errors VALUES('SerpStat', '', 'Word not in dictionary.')
INSERT INTO errors VALUES('freelancers', '', 'Word is not in the dictionary.')
INSERT INTO errors VALUES('SaaS', 'Saab,Seas,SAS', 'Word not in dictionary.')
INSERT INTO errors VALUES('SaaS', 'Seas,SAS', 'Word not in dictionary.')
INSERT INTO errors VALUES('NodeJS', 'Nodes', 'Word not in dictionary.')
INSERT INTO errors VALUES('NodeJS', 'Nodes', 'Word not in dictionary.')
INSERT INTO errors VALUES('async', 'sync', 'Word not in dictionary.')
INSERT INTO errors VALUES('lead generation', 'lead generation', 'Word is not in the dictionary.')
Dump results to JSON
General result format:
[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;
obj = {};
obj.errors = p1.errors;
obj.json %]
Initial text:
[
Final text:
]
Example result:
[{"errors": [{"word":"SaaS","suggest":"Seas","type":"Word not in dictionary."},{"word":"freelancers","suggest":"","type":"Word not in dictionary."},{"word":"Arbitrageurs","suggest":"Arbitrage people","type":"Word not in dictionary."},{"word":"Youtube","suggest":"YouTube","type":"Incorrect use of upper and lower case letters."},{"word":"emails","suggest":"mails","type":"Word not in dictionary."},{"word":"WordStat","suggest":"","type":"Word not in dictionary."},{"word":"Linkbuilding","suggest":"","type":"Word not in dictionary."},{"word":"outreach","suggest":"","type":"Word not in dictionary."},{"word":"Alexa","suggest":"","type":"Word not in dictionary."},{"word":"SEMRush","suggest":"","type":"Word not in dictionary."},{"word":"Ahrefs","suggest":"Href","type":"Word not in dictionary."},{"word":"MajesticSEO","suggest":"","type":"Word not in dictionary."},{"word":"SerpStat","suggest":"","type":"Word not in dictionary."},{"word":"freelancers","suggest":"","type":"Word not in dictionary."},{"word":"SaaS","suggest":"Saab,Seas,SAS","type":"Word not in dictionary."},{"word":"SaaS","suggest":"Seas,SAS","type":"Word not in dictionary."},{"word":"NodeJS","suggest":"Nodes","type":"Word not in dictionary."},{"word":"Parser'a","suggest":"","type":"Word not in dictionary."},{"word":"NodeJS","suggest":"Nodes","type":"Word not in dictionary."},{"word":"async","suggest":"sync","type":"Word not in dictionary."},{"word":"lead generation","suggest":"lead generation","type":"Word not in dictionary."},{"word":"Parse","suggest":"Pair","type":"Word not in dictionary."},{"word":"Instagram","suggest":"","type":"Word not in dictionary."},{"word":"marketplaces","suggest":"","type":"Word not in dictionary."},{"word":"marketplaces","suggest":"","type":"Word not in dictionary."},{"word":"marketplace","suggest":"","type":"Word not in dictionary."},{"word":"Instagram","suggest":"","type":"Word not in dictionary."},{"word":"Bing","suggest":"","type":"Word not in dictionary."},{"word":"newsmen","suggest":"","type":"Word not in dictionary."},{"word":"Redis","suggest":"","type":"Word not in dictionary."},{"word":"scrape","suggest":"","type":"Word not in dictionary."},{"word":"captcha","suggest":"","type":"Word not in dictionary."},{"word":"XEvil","suggest":"Evil,Devil","type":"Word not in dictionary."},{"word":"CapMonster","suggest":"Cap Monster","type":"Word not in dictionary."},{"word":"Captcha","suggest":"","type":"Word not in dictionary."},{"word":"RuCaptcha","suggest":"","type":"Word not in dictionary."},{"word":"scrape","suggest":"argue","type":"Word not in dictionary."},{"word":"scrape","suggest":"","type":"Word not in dictionary."},{"word":"scrape","suggest":"request","type":"Word not in dictionary."},{"word":"brief","suggest":"","type":"Word not in dictionary."},{"word":"tickets","suggest":"","type":"Word not in dictionary."},{"word":"Parser’om","suggest":"","type":"Word not in dictionary."},{"word":"Parser'om","suggest":"","type":"Word not in dictionary."},{"word":"tools","suggest":"nodes, aces, tuls","type":"Word not in dictionary."}]}]
Possible settings
| Parameter | Default value | Description |
|---|---|---|
| Languages | English, Russian, Ukrainian | Checking languages |
| Options | Skip capitalized words, e.g., "ВПК"., Skip words with digits, e.g., "авп17х4534"., Skip internet addresses, email addresses and file names., Ignore Roman numerals ("I, II, III, ..."). | Checking options |
| HTML::TextExtractor preset | default | Preset for HTML::TextExtractor. Allows setting text scraping settings |
