Results Builder
Results Builder - allows you to transform the results from each scraper before their formatting and saving to disk
Capabilities
- Splitting the result into parts using a regular expression or using an arbitrary delimiter
- Replacing a substring in the result or replacing with a regular expression
- Extracting the domain or main domain from a link
- Converting the result to upper\lower case
- Removing HTML tags (
<b>text</b>
->text
) - Converting HTML entities into their Unicode equivalents (
©
->©
) - Retrieving data using XPath queries
Examples
Domain parsing
Saving only domains when parsing links from search engines:
As a source, link
elements from the serp
array from the first scraper are used, to each element a function for extracting the main domain from the link will be applied, the new result will be saved under the same name (link element in the serp array) - therefore, changing the result format is not required
Snippet parsing with cleaning
Saving snippets from search engines with cleaning from HTML tags and converting HTML entities
By default, anchors and snippets are parsed with all nested tags, which allows preserving the same formatting as when viewing the output from search engines. If only plain text is needed, then you can use the capabilities of the Results Builder:
In this example, two Results Builders are sequentially applied to the snippets - removing HTML tags and converting HTML entities
Parsing using XPath
Parsing links from search results using XPath:
In this example, parsing of links from the Google search engine is shown. The following XPath query is used:
//*[@id="rso"]/div[3]/div/div[1]/a/@href