Use of filters

Feb 8, 2016

  • Concept for filtering results(top)

    In A-Parser exists opportunity to filter results on a set of certain rules and to save only necessary data in result. Main options of use:
    • Saving only of those links which contain a certain entrance of string (for example CMS sign)
    • Filtering base of domains in certain parameters (for example PR from 3, Alexa to 100000, EN language)
    • Verification of response from the server (for example 200 OK or the contents of certain titles)
    • Check of finding in a snippet of the initial request
    • Any other options of use where it is required to restrict results on certain conditions

    Filters can be added in the Task editor, having clicked an icon with the tool opposite to a necessary parser:

    It is possible to filter as single results and arrays of results (Representation of results), there are some types of filters:
    • On flatness or roughness of string
    • On entrance or absence of entrance of substring
    • On compliance or not compliance regular expression
    • Numerical values can be filtered on more, less and equality

    Features of work(top)

    • When filtering arrays of results, in array there are results only appropriate under the filter
    • When filtering simple results if the result doesn't appropriate under the filter, the result for this request is entirely passed, including when using several parsers
    • When using two and more filters in the task is in between applied logical and, in other words the result will remain if it appropriate under conditions of all filters at the same time
    • When comparing in the field of value specifying (string, regular expression or numerical value) it is possible to use Template Toolkit, are available all variables similar for General format of results

    Examples of use(top)

    Check of base of sites on finding certain text on the page(top)

    As requests we use the file with links, but as a result we receive the file with links where the required text meets
    We use a parser Net::HTTP Net::HTTP for downloading of the required page, in result we save request (the link which we check). We filter result $data - content of the downloaded page, type of filter is Contain string (contains a string) and we specify string:[​IMG]

    Filtering pictures on resolution(top)

    We filter height and width of the picture when parsing through SE::Google::Images SE::Google::Images, we save only those pictures which it is more than 500х500 pixels:

    Filtering links on entrance of any of several different strings(top)

    For filtering links on several different strings we will use opportunity to specify regular expressions, for this purpose we will write some signs through a separator
    Pay attention that the regular expressions require to shield few characters[​IMG]

    Saving the sites with certain Google PageRank(top)

    We save only the sites with PR more than 4:

    Saving snippets of Google which contain the initial request(top)

    As a string for comparing we specify [% query %] - a variable which contains request, we select Insensitive for search of substring without the register of characters: