SE::Seznam - Czech search system seznam.cz scraper
Scraper overview
The Seznam search results scraper. With the Seznam scraper, you can get large databases of links ready for further use. You can use queries in the same way as you enter them in the Dogpile search bar, including search operators (site, inurl, etc.).
A-Parser's functionality allows you to save the Seznam scraper parsing settings for further use (presets), set up a parsing schedule, and much more. You can use automatic query multiplication, substitution of subqueries from files, enumeration of alphanumeric combinations and lists to obtain the maximum possible number of results.
Results can be saved in the format and structure you need, thanks to the built-in powerful templating engine Template Toolkit, which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.
Collected data
- Links, anchors, and snippets from the search results
- List of related keywords
Capabilities
- Scrapes the maximum number of results returned by Seznam - 50 pages with 20 items in the results
- The total maximum number of results per query - 1000
Use cases
- Collecting link databases - for A-Poster, XRumer, AllSubmitter, etc.
- Backlink (mention) search for websites
- Vulnerable website search
- Any other use cases involving scraping Seznam in one way or another
Queries
Queries should be specified as search phrases, just as if you were entering them directly into the Seznam search form, for example:
test query
окна Москва
site:a-parser.com
inurl:auto
Query substitutions
You can use built-in macros to expand queries. For example, if we want to get a very large database of forums, we will specify several main queries in different languages:
forum
форум
foro
论坛
In the query format, we will specify the enumeration of characters from a to zzzz. This method allows for maximum rotation of the search results and obtaining many new unique results:
$query {az:a:zzzz}
This macro will create an additional 475254
queries for each original search query, which will result in 4 x 475254 = 1901016
search queries in total, an impressive number, but not a problem for A-Parser. At a speed of 2000
requests per minute, this task will be completed in just 16
hours.
Using operators
You can use search operators in the query format, so it will be automatically added to each query in your list:
site:$query
Result output options
A-Parser supports flexible result formatting thanks to the built-in templating engine Template Toolkit, allowing it to output results in any form, as well as in a structured form, such as CSV or JSON.
Exporting a list of links
Links + anchors + snippets with position output
Outputting links, anchors, and snippets in a CSV table
Saving in SQL format
Dump results to JSON
Results processing
A-Parser allows processing results directly during scraping, in this section we have provided the most popular use cases for the Seznam scraper
Link deduplication
Link deduplication by domain
Extracting domains
Removing tags from anchors and snippets
Filtering links by inclusion
Possible settings
Parameter name | Default value | Description |
---|---|---|
Pages count | 5 | Number of pages to scrape (from 1 to 50) |
Links per page | 10 | Number of links on one page (10 / 20) |