I know Google is hard to scrape without getting proxies banned but I'm seeing something strange. When using the task tester for the Google Parser, proxies are blocked/shown captcha image. But if I manually take the same proxy and use in browser it works with no blocks. Why would this be? Maybe Google can tell a full browser is being used and allows the proxy to work? Doesn't matter if I enable or disable https, use sessions or not, etc. Same result. Here is a short video I made showing the above steps in action: http://screencast.com/t/qtc8WL42GlQ5
Please note between your browser and the parser has a few differences: - In the parser is used http, and https in the browser - In the parser domain constantly .co.uk, the browser changes .com, .co.in - A set of parameters in the link differs between the browser and parser - May use different user agents - The same can still be the cause in cookies All this may well affect the issuance captcha.
Thanks for the insight. I tried testing again but this time used NET:HTTP parser so I could be sure to use the same user agent. Also used HTTPS, same Google domain, and same parameters in query url for both. The result was the same (worked in browser not in software): http://screencast.com/t/dGdZpHqpW So it would seem Google can tell that a browser isn't being used. Maybe something to do with javascript being used in the browser? How do you use the "cookie" option in the NET:HTTP parser? Is there a way to see what is being used in the browser and then use that in NET:HTTP cookie option? Just because I'm obsessive I'd like to test using the same cookie as well if possible just to satisfy my curiosity
No JS, your broswer just know your old session(Cookie) The Cookie option accept cookies in string format, try copy it from your browser