Skip to main content

Result Filters

In A-Parser, there is an option to filter results according to a set of defined rules and save only the necessary data in the result.

Using Filtering

Main application options:

  • Saving only those links that contain a certain string occurrence (for example, a CMS feature)
  • Filtering a domain database by certain parameters (for example, Yandex IKS from 300, language RU)
  • Checking the server response (for example, 200 OK or the presence of certain headers)
  • Checking for the presence of the original query in the snippet
  • Any other application options where it is necessary to limit the results by certain conditions

Filters can be added in the Task Editor by clicking on the tool icon next to the required scraper:

Filter option in the Task Editor

Types of Filtering

You can filter both individual results and arrays of results. There are several types of filters:

  • By equality or inequality of strings
  • By the presence or absence of a substring occurrence
  • By matching or not matching a regular expression
  • Numerical values can be filtered by greater than, less than, and equality

Operation Features

  • When filtering arrays of results, only the results that fall under the filter remain in the array
  • When filtering simple results, if the result does not fall under the filter, then the result for that query is completely skipped, including when using multiple scrapers
  • When using two or more filters in a task, a logical AND is applied between them, in other words, the result will be saved if it falls under the conditions of all filters simultaneously
  • When comparing in the value indication field (string, regular expression, or numerical value), you can use the Template Toolkit templating system, all variables are available similar to General Result Format

Examples

Filtering by Text on a Page

Checking a database of websites for the presence of certain text on the page
We use a file with links as queries, and as a result, we get a filtered file with links where the searched text is found.

We use the Net::HTTPNet::HTTP scraper to download the desired page, in the result we save the query (the link we are checking). We filter the result $data - the content of the downloaded page, set the filter type to Contains string and specify the string itself:

Example of filtering by text on a page

Filtering Images by Size

Filtering images by resolution
We filter the height and width of the image when parsing through SE::Google::ImagesSE::Google::Images, saving only those images that are larger than 500x500 pixels:

Example of filtering images by size

Filtering by Multiple Attributes

Filtering links by the occurrence of any of several different strings
To filter links by several different strings, we will use the ability to specify regular expressions, for this we will write several features through a separator:

showthread\.php
/forum/
viewtopic\.php\?t=
note

Please note that regular expressions require escaping a number of characters

Example of filtering by several criteria

Using a query in a filter

Saving Google snippets that contain the original query
As a comparison string, we explicitly specify [% query %] - a variable that contains the query, select Case sensitive for substring search with case sensitivity:

Example of using a query in a filter