Optimize Google Scraping with Net::HTTP

Discussion in 'A-Parser Support Forum' started by scrapefun, Aug 16, 2015.

  1. scrapefun

    scrapefun A-Parser Enterprise License
    A-Parser Enterprise

    Joined:
    Feb 24, 2015
    Messages:
    188
    Likes Received:
    34
    I've setup a few custom Google parsers using the Net::HTTP parser and they work great but I can't ever seem to achieve anywhere close to the same speeds as parsing with the Google Parser.

    One of the reasons I know is I am using different user agents which deliver larger file sizes so I know that contributes to the slow down but are there any tips or best practices for optimizing a custom Google parser using Net::HTTP?

    I have the settings the same for things like threads, query delay, proxy ban time, timeout, etc so I know those settings aren't a factor. And the query strings are the same as far as I can tell basically just using newer user agents or mobile user agents.

    With the Google parser there is an option to enable sessions but not with Net::HTTP could that be a reason for slower scraping?

    I know this is a pretty general question but any tips/ideas for optimizing for speed would be welcome.

    As always, thanks!
     
    #1 scrapefun, Aug 16, 2015
    Last edited: Aug 16, 2015
  2. scrapefun

    scrapefun A-Parser Enterprise License
    A-Parser Enterprise

    Joined:
    Feb 24, 2015
    Messages:
    188
    Likes Received:
    34
    I just noticed the "cookies" option for the Net::HTTP parser? Is this something I could use to benefit Google scraping? If so , what would I put here?
     
  3. Support

    Support Administrator
    Staff Member A-Parser Enterprise

    Joined:
    Mar 16, 2012
    Messages:
    4,598
    Likes Received:
    2,181
    Tip a single - experiment with the current setting (ones you have listed). And about cookies - I doubt that they somehow affect the speed ...
     
  4. scrapefun

    scrapefun A-Parser Enterprise License
    A-Parser Enterprise

    Joined:
    Feb 24, 2015
    Messages:
    188
    Likes Received:
    34
    Thanks for the reply.

    I guess "speed" might have been the wrong word. Would the cookie setting or the "emulate browser headers" option help improve the success of the queries?

    I am using same proxies as well and didn't know if using cookies or the browser headers option would help improve the success of the queries which in turn would make things faster since there would be less retires.
     

Share This Page