1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.
  2. Join our Telegram chat: https://t.me/a_parser_en
    Dismiss Notice

Net::HTTP - Downloads the specified page, supports multipage parsing

Dec 24, 2020

  • Collected data(top)

    • Response code from server
    • Description of the response of the server
    • Titles of the response of the server
    • Content
    • Proxies used in case of this request
    • Array with all pages (it is used by work of the option Use Pages)


    • Option Check content - checks if the regular expression didn't work, the page will be loaded again with other proxy
    • Option Use Pages - allows to enumerate the specified number of pages with a certain step. $pagenum - variable that contains the current page number when iterating. It should be used for substitution in the right place.
    • Option Check next page - RegEx, whether which defines there is the following page or not, and if exists - switches to it, within a specified limit (0 - no limit)

    • Option Page as new query - sends a link to the next page as a new query, thus allowing to remove a limit on the number of pages to navigate

    Use options(top)


    As requests it is necessary to specify links to pages:

    Possible settings(top)

    Global settings for all parsers
    ParameterValue by defaultDescription
    Good statusAllChoice what the response from the server will be it is considered successful. If when parsing there is other response from the server, the request will be repeated with other proxy
    Good code RegEx-the ability to specify a regular expression to check the response code
    MethodGETRequest method
    POST body-Content for sending to the server when using the POST method. Supports variables $query - url of request, $query.orig - the initial request and $pagenum - number of the page when using the option Use Pages
    Cookies-Opportunity to specify cookies for request
    User agentMozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)User-Agent header in case of request of pages
    Additional headers-Opportunity to specify arbitrary headers of request, with support of opportunities of Template Toolkit and use of variables from Query builder
    Read only headersRead only headers. Allows to save in certain cases traffic if there is no need to parse content.
    Detect charset on contentOpportunity to recognize the coding on the basis of page contents
    Emulate browser headersOpportunity to emulate browser headers
    Max redirects count7Maximum quantity of redirects on which will follow a parser
    Max cookies count16The maximum number of Cookie for saving
    Bypass CloudFlareAutomatic bypass CloudFlare checks on the browser
    Follow common redirectsThis option allows parser do redirects http <-> https and www.domain <-> domain within the same domain, bypassing the Max redirects count limit
    EngineHTTP (Fast, JavaScript Disabled)Allows you to select HTTP engine (faster, JavaScript disabled) or Chrome engine (slower, JavaScript enabled).
    Chrome HeadlessIf the option is enabled, the browser will not display.
    Chrome DevToolsAllows you to use tools to debug Chromium.
    Chrome Log Proxy connectionsIf enabled, it logs chrome connections.
    Chrome Wait Untilnetworkidle2Determines when a page is considered loaded. More about values
    Use HTTP/2 transportDetermines whether to use HTTP/2 instead of HTTP/1.1. For example, Google and Majestic will immediately ban if you use HTTP/1.1.
    Bypass CloudFlare with Chrome(Experimental)Bypass CF with Chrome.
    Bypass CloudFlare with Chrome Max Pages-Max page count when bypassing CF with Chrome.
Helgun, Monsur and capturis like this.