1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  2. Join our Telegram chat: https://t.me/a_parser_en
    Dismiss Notice

Result builder

Oct 20, 2015

  • Results builder - allows transform requests and results from each parser before them formatting and saving to disk

    • Division of result into parts by means of regular expression or by means of arbitrary separator
    • Changeover of substring in result or changeover by regular expression
    • Separation of the domain\master domain from link
    • Coercion of result to upper\lower register
    • Deleting HTML tags (<b>text</b> -> text)
    • Conversion of HTML entities to their Unicode equivalents (&copy; -> ©)
    • Retreiving data using XPath-requests


    Main options of use(top)

    • Saving in result only domains
    • Cleaning of the text from HTML tags
    • Search and changeover of substrings
    • Parsing of arbitrary information through the regular expressions or by using XPath-requests


    Saving only domains when parsing links from search engines(top)

    As a source (Source result) are used elements link from serp array from the first parser (p1), function of extraction of master domain from the link will be applied to each element, new result will be saved under the same name (element link in serp array) - therefore it isn't required to change a result format

    Saving snippets from search engines with cleaning from HTML tags and conversion of HTML entities(top)

    By default anchors and snippets parsing with all nested tags, that allows to save the same formatting as when viewing output from search engines. If only the blank text is necessary that it is possible to use opportunities of Results builder:
    In this example sequentially are applied to snippets two Results builder - deleting HTML tags and the conversion of HTML entities

    Parsing links from search results using XPath(top)

    In this example shows the parsing of links from a search engine search.disconnect.com. Used XPath-request //ul[contains(@id,'normal-results')]/li/a/@href
sleeprock, gosha, qazwsxedc and 5 others like this.