HTML::TextExtractor - Parser of text units

Nov 15, 2016

  • Collected data(top)


    • Parsing text units from the specified page

    Opportunity(top)


    • Automatic cleaning of the text from HTML tags
    • Possibility of specifying of the minimum length of the text unit
    • Optionally deleting link anchors from the text

    Use options(top)


    • Parsing of text content from any sites

    Requests(top)


    • As requests it is necessary to specify links to pages with which it is necessary to parse text units, for example:


    Results(top)


    • The text with the page specified in request is as a result displayed:


    Possible settings(top)


    Global settings for all parsers
    ParameterValue by defaultDescription
    Min block length50Minimum length of text unit in characters
    Skip anchor textWhether to skip anchors in the text
    Bypass CloudFlareAutomatic bypass CloudFlare checks on the browser
seroja, jumanji and DeXtR like this.