Representation of results

Nov 13, 2015


  • A-Parser was created for parsing of information of any kinds, were for this purpose introduced 2 types of results :
    • Simple results (Flat)
    • Arrays of results (Array)
    We will consider each type on the example of a parcer SE::Google SE::Google, screenshot of search result:
    [​IMG]

    Simple results(top)


    Simple results - when corresponds to one request one result, examples:
    • Amount of results on request ($totalcount)
    • Whether the request is a misprint ($misspell, on a screenshot isn't displayed)
    Other examples:
    Single results contains in normal variables ($ prefix + name)

    Arrays of results(top)


    Arrays of results - when corresponds to one request the list of results, each element of list in turn may contain some nested elements. Let us consider on example of search output of Google - it is provided in a parser by an array $serp, for descriptive reasons we will use the table, in which write the first 5 results of output:
    Link ($link)Anchor ($anchor)Snippet ($snippet)
    http://www.speedtest.net/Speedtest.net by Ookla - The Global Broadband Speed TestTest your Internet connection bandwidth to locations around the world with this interactive broadband speed test from Ookla.
    http://en.wikipedia.org/wiki/Test_cricketTest cricket - Wikipedia, the free encyclopediaTest cricket is the longest form of the sport of cricket. Test matches are played between national representative teams with "Test status", as determined by the ...
    http://www.speakeasy.net/speedtest/Speakeasy Speed TestSaturday 03-May 2014, 11:04:29 AM Your IP: The Speakeasy Speed Test requires Flash v7 or higher. Please update your browser. See Pricing Or Call Today
    http://www.humanmetrics.com/cgi-win/jtypes2.aspPersonality test based on C. Jung and I. Briggs Myers type theoryHumanmetrics Jung Typology Test™ instrument uses methodology, questionnaire, scoring and software that are proprietary to Humanmetrics, and shall not be ...
    http://test-ipv6.com/Test your IPv6.This will test your browser and connection for IPv6 readiness, as well as show you your current IPV4 and IPv6 address. ... Test your IPv6 connectivity. JavaScript ...

    Each line item of output writers in array with 3 nested elements - the link ($link), an anchor ($anchor), a snippet ($snippet)
    Other example - the list of the related keywords, which remains in array $related:
    Keyword ($key)
    test wwe
    depression test
    test my speed
    wonderlic test
    test personality
    act test
    jiggle test
    bipolar test

    Apparently in this array only one nested element - keyword ($key)
    Numbering of elements in arrays begins with 0, an example of access to separate array cells:
    • $serp.0.link - the first link from output
    • $serp.3.anchor - the fourth anchor from output
    • $related.0.key - first related keyword
    More details about formatting of simple results and arrays it will be described below

    Viewing of possible results(top)


    Each parser has the result set, view the list of available results possible to having guided at a parcer the pointer, in the tooltip balloon will be displayed the list of simple results and arrays, with the list of nested elements:

    [​IMG]
    Yellow are marked results which is the general for all parsers :
    • $query - the request transferred to a parser after formatting
    • $query.orig - original request (in that look as it was in the file or in a field of request)
    • $query.first - the first request when using options of nested parsing (Parse all results or Parse to level)
    • $info.success - information about success of parsing of this request
    • $info.retries - number of the used retries for this request
    • $info.stats - statistics of work a parcer for this request
    • $pages.$i.data - an array with the raw responses from the server for possibility of free extraction of additional information
    Green are marked results available only to a parser SE::Google SE::Google:
    • $related.$i.key - an array with the list of the related keywords
    • $ads with elements $link, $anchor and $snippet - an array with the list of ads
      $serp
    • with $link elements , the $anchor and the $snippet - an array with the main output of the searcher
    Pay attention that for arrays $i variable obviously is specified, meaning that elements a few and it is possible to address to them on an index (number of position) or look over each element in a cycle

    Also available variable $response, which allows you get any query variables, including all previous redirections.[​IMG]


    Basic principles of formatting(top)


    After the parser collected data in simple results and arrays, they need to be displayed (or save to the file) in the necessary format. For convenience and functionality in A-Parser used Template Toolkit. We will investigate often used constructions, for this purpose we will use the tool Template tester. We will select the project for a parser SE::Google SE::Google:
    [​IMG]
    On a screenshot 3 fields are provided:
    • JSON - internal data representation in a parcer
    • Template - a template on which there is a formatting of result
    • Result - directly the transformed data on the specified template, in such look result will be written in the file
    Changing a template we can change a type of result, we will consider the following template:
    [​IMG]
    Select the basic rules:
    • Plain text is output in result as is, without changes
    • For output of simple results it is necessary in the right place to output the variable, containing the necessary result with $ prefix
    • For formatting of arrays is used format method, about it is below
    • \n is responsible for line break

    Formatting of arrays, we will investigate construction:
    Code:
    $serp.format('$link $anchor\n$snippet\n\n')
    This record means that for an array of the $serp it is necessary to call the method format with parameter '$link $anchor\n$snippet\n\n'. Method format joins to a string all array cells on the template specified in parameter, and the template means For everyone element in array of $serp to output the link and anchor through a gap, then since a new line to output a snippet, then there are two more line breaks, as a result of generating blank line between results

    The output of variable values in JSON:
    Code:
    $results.json
    .json method allows you to display data in JSON format:
    [​IMG]

    Examples
    Output of competition on request (amount of results on request) for all parsers of search engines (SE::Google SE::Google, SE::Yandex SE::Yandex...):
    Code:
    $query: $totalcount\n
    Result:
    Code:
    test: 3910000000
    viagra: 278000000
    окна пвх: 3220000
    ...

    Output of links from output of search engines:
    Code:
    $serp.format('$link\n')
    Result:
    Code:
    http://www.speedtest.net/
    http://www.speakeasy.net/speedtest/
    http://en.wikipedia.org/wiki/Test_cricket
    http://www.humanmetrics.com/cgi-win/jtypes2.asp
    http://html5test.com/
    http://test-ipv6.com/
    ...

    Output of suggests from search engines:
    Code:
    $results.format('$suggest\n')
    Result:
    Code:
    тестовый сервер танки онлайн
    тесты гиа по русскому языку
    тесто для блинов рецепт
    тестикула
    тесто для пиццы на молоке
    ...

    Output of statistics on a keyword using a parser SE::Yandex::WordStat SE::Yandex::WordStat:
    Code:
    тест - 11233054, updated: 30.04.2014
    keywords:
    тест: 11233054
    тест класс: 1319919
    тест драйв: 1051495
    тесты онлайн: 827044
    тесто +для теста: 729279
    тесты 2014: 592935
    ...
    
    additional keywords:
    mail: 20449501
    анекдоты: 1813239
    анекдоты +из россии: 22754
    анекдоты приколы: 9122
    приколы: 4677777
    test: 872855
    ...