Usage of the unique feature

Dec 25, 2015


  • Unique, deduplication, removing duplicates - all this implies that we don't need the repeating results.
    In A-Parser is 2 methods of unique, we will analyze in detail everyone

    Unique results by string(top)


    This method works after formation of result (Basic principles of formatting), just before record of result in file is checked every string for uniqueness and in file only new unique strings writing
    It is possible to include uniqueness on a line in Quick Task:
    [​IMG]

    or in Add Task:
    [​IMG]


    Unique by any result(top)


    Unique by any result allows to do unique directly on the selected result from a selected parser (Representation of results in a parser). It is possible to add this type of an unique to Task editor, having clicked on an icon with the tool to the right of a parser and having clicked Add unique result:
    [​IMG]
    Now it is possible to select on what result to do unique and its type:[​IMG]

    Global switch is used when 2 or more parsers are selected, it defines to do the global unique or on each parser separately.

    Types of unique(top)


    ParameterDescription
    StringUnique on a string (all string of result is compared entirely)
    DomainUnique on the domain (the domain is compared entirely, for example www.domain.com and domain.com is different domains)
    Top Level domainUnique on top domain taking into account regional, commercial, educational and other domains (for example domain.co.uk and domain2.co.uk is different domains, but sub1.domain.com and sub2.domain.com - identical)
    2nd Level domainUnique on the domain of the 2nd level (are compared domains of the second level, for example www.domain.com, domain.com and user.subdomain.domain.com it everything one domain)
    PathUnique till a way (folders, for example http://domain.com/path1/file.php and http://domain .com/path1/file2.php - identical folders are compared)
    Without paramsUnique according to link without parameters (are compared links without parameters, for example http://domain.com/file.php?page=1 and http://domain.com/file.php?page=2 - identical links)


    Unique of requests(top)


    Unique of requests sends directly for parsing only unique requests, it is earlier not parsing in the current task. Main options of use:
    • If in the initial requests there are doubles and undesirable parsing them (double work)
    • When using the option Parse to level is necessary use only unique results to prevent growth and cycling of requests (for example when using a parser HTML::LinkExtractor HTML::LinkExtractor)
    In all other cases unnecessary use of unique of requests will only decelerate work of a parser

    Save your specified unique for later use(top)


    In A-Parser there is an opportunity to save base of unique, for use in future tasks, that allows save in new tasks only new unique results (for example links when parsing search output in SE::Google SE::Google)
    For saving base of unique it is necessary to create a new name of base when adding the first task:[​IMG]
    For all subsequent tasks it is necessary to select earlier created base name, thereby will be only new unique results will remain, whether irrespective of there is a record of results to the same file as in the first task or to the new file
loveseo and high_skill like this.