SE::Baidu - Baidu Search Results Scraper
Scraper Overview
Scraper of Baidu search results. Thanks to the Baidu scraper, you can obtain huge databases of links ready for further use. You can use queries in the same way as you enter them in the Bing search bar, including search operators (filetype, site, intitle).
A-Parser functionality allows you to save Baidu scraper parsing settings for later use (presets), schedule parsing, and much more. You can use automatic query multiplication, substitution of subqueries from files, permutation of alphanumeric combinations and lists to get the maximum possible number of results.
In the Baidu scraper, saving results is possible in the form and structure you need, thanks to the built-in powerful template engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.
Use Cases for the Scraper
🔗 Scraping full Baidu links
This resource shows how to scrape full links
🔗 Baidu Suggestions
Multi-level scraping of Baidu suggestions
🔗 JS scraper JS::SE::Baidu::Suggest
Creating JS scrapers. Obtaining Baidu suggestions
Data Collected
- Links
- Snippets
- Anchors
- Total number of results
- List of related words
- Number of search result pages
Capabilities
- Scrapes up to 5000 results per query
- Support for all Baidu search operators (filetype:, site:, intitle:).
- Collects search results and related keywords
- Converts truncated links into full ones (option Get full links)
Use Cases
- Collecting link databases - for A-Poster, XRumer, AllSubmitter, etc.
- Competition assessment for keywords
- Checking website indexing
- Collecting pages that contain specified keywords in the page title
Queries
As queries, you need to specify search phrases, for example:
test
site:www.baidu.com
百度产品大全
intitle:парсер
Query Substitutions
You can use built-in macros for query multiplication, for example, if we want to get a very large database of forums, we will specify several main queries in different languages:
forum
форум
foro
论坛
In the query format, we will specify a permutation of characters from a to zzzz, this method allows to maximally rotate the search output and get many new unique results:
$query {az:a:zzzz}
This macro will create 475254
additional queries for each original search query, which in total will give 4 x 475254 = 1901016
search queries, an impressive figure, but it's not a problem for A-Parser at all. At a speed of 2000
queries per minute, such a task will be processed in just 16
hours.
Using Operators
You can use search operators in the query format, so it will be automatically added to each query from your list:
site:$query
Output Results Examples
A-Parser supports flexible formatting of results thanks to the built-in Template Toolkit, which allows it to output results in any form, as well as in structured formats, such as CSV or JSON
Export list of links
Links + anchors + snippets with position output
Output links, anchors, and snippets in a CSV table
Saving related keywords
Keyword competition
Checking link indexing
Saving in SQL format
Dumping results to JSON
Results processing
A-Parser allows processing results directly during scraping, in this section we have listed the most popular cases for the Baidu scraper
Link deduplication
Link deduplication by domain
Extracting domains
Removing tags from anchors and snippets
Filtering links by inclusion
Possible settings
Parameter name | Default value | Description |
---|---|---|
Pages count | 5 | Number of pages to scrape (from 1 to 100) |
Links per page | 50 | Number of links in the search results per page (10 / 20 / 50) |
Get full links | ☐ | Conversion of truncated links to full links (disabled by default) |