Results Builder
Results Builder - allows you to transform the results from each scraper before formatting and saving them to disk.
Capabilities
- Splitting the result into parts using a regular expression or an arbitrary delimiter
- Replacing a substring in the result or replacing it with a regular expression
- Extracting the domain or main domain from the link
- Converting the result to uppercase\lowercase
- Removing HTML tags(
<b>text</b>
->text
) - Converting HTML entities to their Unicode equivalents(
©
->©
) - Retrieving data using XPath queries
Examples
Parsing domains
Saving only domains when parsing links from search engines:
The source result is the link
elements from the serp
array of the first scraper (p1
). The main domain extraction function will be applied to each element, and the new result will be saved under the same name (the link element in the serp array) - so there is no need to change the result format.
Parsing snippets with cleaning
Saving snippets from search engines with cleaning from HTML tags and conversion of HTML entities.
By default, anchors and snippets are parsed with all nested tags, which allows you to save the same formatting as when viewing the search results. If only plain text is needed, you can use the capabilities of Results Builder:
In this example, two Results Builders are sequentially applied to the snippets - removing HTML tags and converting HTML entities.
Parsing with XPath
Parsing links from search results using XPath:
This example shows parsing links from the google.com search engine. The XPath query used is:
//*[@id="rso"]/div[3]/div/div[1]/a/@href