Net::HTTP - Downloads the specified page, supports multipage parsing

Sep 23, 2019

  • Collected data(top)

    • Response code from server
    • Description of the response of the server
    • Titles of the response of the server
    • Content
    • Proxies used in case of this request
    • Array with all pages (it is used by work of the option Use Pages)


    • Option Check content - checks if the regular expression didn't work, the page will be loaded again with other proxy
    • Option Use Pages - allows to enumerate the specified number of pages with a certain step. $pagenum - variable that contains the current page number when iterating. It should be used for substitution in the right place.
    • Option Check next page - RegEx, whether which defines there is the following page or not, and if exists - switches to it, within a specified limit (0 - no limit)

    • Option Page as new query - sends a link to the next page as a new query, thus allowing to remove a limit on the number of pages to navigate

    Use options(top)


    As requests it is necessary to specify links to pages:

    Possible settings(top)

    Global settings for all parsers
    ParameterValue by defaultDescription
    Good statusAllChoice what the response from the server will be it is considered successful. If when parsing there is other response from the server, the request will be repeated with other proxy
    Good code RegEx-the ability to specify a regular expression to check the response code
    MethodGETRequest method
    POST body-Content for sending to the server when using the POST method. Supports variables $query - url of request, $query.orig - the initial request and $pagenum - number of the page when using the option Use Pages
    Cookies-Opportunity to specify cookies for request
    User agentMozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)User-Agent header in case of request of pages
    Additional headers-Opportunity to specify arbitrary headers of request, with support of opportunities of Template Toolkit and use of variables from Query builder
    Read only headersRead only headers. Allows to save in certain cases traffic if there is no need to parse content.
    Detect charset on contentOpportunity to recognize the coding on the basis of page contents
    Emulate browser headersOpportunity to emulate browser headers
    Max redirects count7Maximum quantity of redirects on which will follow a parser
    Max cookies count16The maximum number of Cookie for saving
    Bypass CloudFlareAutomatic bypass CloudFlare checks on the browser
    Follow common redirectsThis option allows parser do redirects http <-> https and www.domain <-> domain within the same domain, bypassing the Max redirects count limit
Monsur and capturis like this.