I've setup a few custom Google parsers using the Net::HTTP parser and they work great but I can't ever seem to achieve anywhere close to the same speeds as parsing with the Google Parser. One of the reasons I know is I am using different user agents which deliver larger file sizes so I know that contributes to the slow down but are there any tips or best practices for optimizing a custom Google parser using Net::HTTP? I have the settings the same for things like threads, query delay, proxy ban time, timeout, etc so I know those settings aren't a factor. And the query strings are the same as far as I can tell basically just using newer user agents or mobile user agents. With the Google parser there is an option to enable sessions but not with Net::HTTP could that be a reason for slower scraping? I know this is a pretty general question but any tips/ideas for optimizing for speed would be welcome. As always, thanks!
I just noticed the "cookies" option for the Net::HTTP parser? Is this something I could use to benefit Google scraping? If so , what would I put here?
Tip a single - experiment with the current setting (ones you have listed). And about cookies - I doubt that they somehow affect the speed ...
Thanks for the reply. I guess "speed" might have been the wrong word. Would the cookie setting or the "emulate browser headers" option help improve the success of the queries? I am using same proxies as well and didn't know if using cookies or the browser headers option would help improve the success of the queries which in turn would make things faster since there would be less retires.