Formatting and substitutions of requests

Dec 8, 2015


  • Query format - allows to add substitutions and formating request to the necessary look using templates, is applied to each request

    [​IMG]

    There are 2 methods to specify a template:
    • General format of requests, it is processed by the first and supports substitutions
    • Request format for each parser - allows to set a specific format for separate parsers
    We will consider an example on a screenshot, we will assume that as requests we use the file with the list of domains of such look:

    General format of request is set in a look
    Before each initial request (domain) the line will be added http://, the request will be transformed to google.com -> http://google.com

    Request format for a parser 1 remained invariable, the parser 1 will parsing request http://google.comRequest format for a parser 2 looks as follows:
    Request for this parser will be transformed: http://google.com -> site:http://google.com

    Templates in requests(top)



    In a format of request Template Toolkit is fully supported, as variables are available:
    • $query - request after formatting through the general format of result
    • $query.num - sequence number of request
    • $query.lvl - the request nesting level when using the options Parse to level or Parse all results
    • $query.orig - the initial request before formatting
    • $query.first - the first request when using the options Parse to level or Parse all results
    • All variables, created through the Query builder

    Macroses of substitutions(top)


    The general format of request supports the following macroses:
    MacrosDescriptionExamples
    {az:START:END}Substitution of digital-character sequence
    Instead of START is specified the beginning of sequence, instead of END - the end
    Length END shall be more or is equal to length START
    Characters at the end of sequence END shall be after (in alphabetical order) characters at the beginning of sequence START
    It is possible to use any UTF-8 of a string

    {az:a:z} - substitution of all characters from a to z (a, b, c, ..., x, z)
    {az:aaa:zzz} - substitution of all characters from aaa to zzz (aaa, aab, aac, ..., zzx, zzz)
    {az:a:zz} - substitution of all characters from a to zz (a, b, c, ... aa, ab, ..., zx, zz)
    {az:00:99} - substitution of all numbers from 00 to 99(00, 01, 02, ..., 98, 99)
    {az:а:яяя} - substitution of all Cyrillic characters from а to яяя (а, ..., аа, аб, ..., яяю, яяя)
    {each:WORD1,WORD2,...}Substitution of the specified words WORD1, WORD2, etc., length isn't restricted{each:green, blue, red, black} - substitution of the words green, blue, red, black
    {each:,buy,sell} - substitution of the empty word, then buy and sell
    {subs:NAME}Substitution of additional words from files in folder queries/subs/
    Instead NAME it is necessary to enter file name, without extension of .txt
    {subs:zones} - substitution of all lines from queries/subs/zones.txt file
    {num:START:END}Macros enumerates numbers in the specified range. Instead START is specified the beginning of an interval, instead END - the end{num:1:1000} - substitution of all numbers from 1 to 1000(1, 2, 4, ..., 999, 1000)
    {num:START:END:STEP}Macros enumerates numbers in the specified range, with specified step. Instead START is specified the beginning of an interval, instead END - the end, instead STEP - step{num:0:1000:10} - substitution of all numbers from 0 to 1000 with step 10 (0, 10, 20 ..., 990, 1000)


    Macroses of substitutions can be combined, complex example:
    We will assume one of request for parsing there was viagra, is in queries/subs/zones.txt the following list of zones: com, net, org, then on parsing will arrive the following set of combinations:

    Total number of requests will correspond to multiplication of possible combinations:1 request (viagra) x 3 zones ({subs:zones}) x 676 character variations ({az:aa:zz}) = 2028 requests