Skip to main content

Result Filters

In A-Parser, it is possible to filter results according to a set of specific rules and save only the necessary data.

Using Filters

The main ways to use filters are:

  • Saving only those links that contain a specific string (for example, a CMS sign)
  • Filtering a domain database by specific parameters (for example, Yandex IKS from 300, Alexa up to 100000, language RU)
  • Checking the server response (for example, 200 OK or the content of specific headers)
  • Checking for the presence in the snippet of the original query
  • Any other use cases where it is necessary to limit results according to specific conditions

Filters can be added in the Task Editor by clicking on the tool icon next to the necessary scraper: Filter option in the Task Editor

Types of Filtering

It is possible to filter both single results and arrays of results (Representation of Results). There are several types of filters:

  • By equality or inequality of strings
  • By the presence or absence of a substring
  • By matching or not matching a regular expression
  • Numeric values can be filtered by greater than, less than, and equality

Features of Filtering

  • When filtering arrays of results, only the results that fall under the filter remain in the array
  • When filtering simple results, if the result does not fall under the filter, the result for this query is completely skipped, including when using multiple scrapers
  • When using two or more filters in a task, the logical AND is applied between them, in other words, the result will be saved if it meets the conditions of all filters simultaneously
  • When comparing in the field of specifying a value (string, regular expression, or numeric value), you can use the Template Toolkit template engine, all variables similar to those for General Result Format are available

Examples

Filtering by Text on a Page

Checking the site database for the presence of specific text on the page.

We use a file with links as queries, and as a result, we get a file with links where the desired text is found.

We use the Net::HTTPNet::HTTP scraper to download the desired page, and save the request (the link we are checking) as a result. We filter the $data result - the content of the downloaded page, the filter type is Contain string, and we specify the string itself:

Example of filtering by text on a page

Filtering Images by Size

Filtering images by resolution.

We filter the height and width of the image when parsing through SE::Google::ImagesSE::Google::Images, saving only those images that are larger than 500x500 pixels:

Example of filtering images by size

Filtering by Several Attributes

Filtering links by the presence of any of several different strings.

To filter links by several different strings, we use the ability to specify regular expressions, for this we write several attributes separated by a delimiter:

showthread\.php
/forum/
viewtopic\.php\?t=
note

Note that regular expressions require escaping a number of characters

Example of filtering by several attributes

Filtering by a Specific Parameter

Saving sites with a specific Alexa Rank.

We save only sites with an Alexa Rank greater than 100:

Example of filtering by a specific parameter

Using a Query in a Filter

Saving Google snippets that contain the original query.

As a string for comparison, we explicitly specify [% query %] - a variable that contains the query, select Sensitive to search for a substring with case sensitivity:

Using a query in a filter