Skip to main content

Result Filters

In A-Parser, it is possible to filter results based on a set of defined rules and save only the necessary data in the output.

Using filtering

Main use cases:

  • Saving only those links that contain a specific string inclusion (e.g., a CMS indicator)
  • Filtering domain database by specific parameters (e.g., Yandex TCI from 300, RU language)
  • Checking the server response (e.g., 200 OK or the presence of specific headers)
  • Checking for the source query in the snippet
  • Any other use cases where it is required to limit results based on certain conditions

Filters can be added in the Task Editor by clicking the tool icon next to the required scraper:

Filter option in the Task Editor

Filter types

You can filter both single results and arrays of results. There are several types of filters:

  • By equality or inequality of strings
  • By inclusion or absence of a substring
  • By matching or not matching a regular expression
  • Numerical values can be filtered by greater than, less than, and equality

Usage features

  • When filtering result arrays, only the results that meet the filter condition remain in the array
  • When filtering simple results, if a result does not meet the filter condition, the entire result for this query is skipped, including when using multiple scrapers
  • When using two or more filters in a task, a logical AND, is applied between them; in other words, the result will be saved if it meets the conditions of all filters simultaneously
  • When comparing in the value specification field (string, regular expression, or numerical value), you can use the Template Toolkit templating engine, ; all variables similar to those for are available

Examples

Filtering by text on the page

Checking a site database for the presence of specific text on a page
We use a file with links as queries, and the result is a filtered file with links where the desired text is found.

We use the Net::HTTPNet::HTTP scraper to download the desired page, and save the query (the link we are checking) in the result. We filter the result $data - the content of the downloaded page, , set the filter type to Contains string , and specify the string itself:

Example of filtering by text on a page

Filtering images by size

Filtering images by resolution
We filter the height and width of the image when scraping via SE::Google::ImagesSE::Google::Images, , saving only those images that are larger than 500x500 pixels:

Example of filtering images by size

Filtering by several attributes

Filtering links by inclusion of any of several different strings
To filter links by several different strings, we will use the ability to specify regular expressions, ; for this, we write down several attributes separated by a delimiter:

showthread\.php
/forum/
viewtopic\.php\?t=
note

Note that regular expressions require escaping a number of characters

Example of filtering images by size

Using the query in a filter

Saving Google snippets that contain the source query
We explicitly specify the [% query %] - variable that contains the query as the comparison string, select Case sensitive for case-sensitive substring search:

Example of using a query in a filter