1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  2. Join our Telegram chat: https://t.me/a_parser_en
    Dismiss Notice

HTML::TextExtractor - Parser of text units

Nov 15, 2016

  • Collected data(top)

    • Parsing text units from the specified page


    • Automatic cleaning of the text from HTML tags
    • Possibility of specifying of the minimum length of the text unit
    • Optionally deleting link anchors from the text

    Use options(top)

    • Parsing of text content from any sites


    • As requests it is necessary to specify links to pages with which it is necessary to parse text units, for example:


    • The text with the page specified in request is as a result displayed:

    Possible settings(top)

    Global settings for all parsers
    ParameterValue by defaultDescription
    Min block length50Minimum length of text unit in characters
    Skip anchor textWhether to skip anchors in the text
    Bypass CloudFlareAutomatic bypass CloudFlare checks on the browser