Strange When Scraping Google

scrapefun · May 19, 2016

I know Google is hard to scrape without getting proxies banned but I'm seeing something strange.

When using the task tester for the Google Parser, proxies are blocked/shown captcha image. But if I manually take the same proxy and use in browser it works with no blocks.

Why would this be? Maybe Google can tell a full browser is being used and allows the proxy to work?

Doesn't matter if I enable or disable https, use sessions or not, etc. Same result.

Here is a short video I made showing the above steps in action:
http://screencast.com/t/qtc8WL42GlQ5

Support · May 19, 2016

Please note between your browser and the parser has a few differences:
- In the parser is used http, and https in the browser
- In the parser domain constantly .co.uk, the browser changes .com, .co.in
- A set of parameters in the link differs between the browser and parser
- May use different user agents
- The same can still be the cause in cookies
All this may well affect the issuance captcha.

scrapefun · May 19, 2016

Thanks for the insight.

I tried testing again but this time used NET:HTTP parser so I could be sure to use the same user agent. Also used HTTPS, same Google domain, and same parameters in query url for both.

The result was the same (worked in browser not in software):
http://screencast.com/t/dGdZpHqpW

So it would seem Google can tell that a browser isn't being used. Maybe something to do with javascript being used in the browser?

How do you use the "cookie" option in the NET:HTTP parser? Is there a way to see what is being used in the browser and then use that in NET:HTTP cookie option? Just because I'm obsessive I'd like to test using the same cookie as well if possible just to satisfy my curiosity

Forbidden · May 20, 2016

scrapefun said: ↑

So it would seem Google can tell that a browser isn't being used. Maybe something to do with javascript being used in the browser?
Click to expand...

No JS, your broswer just know your old session(Cookie)

scrapefun said: ↑

How do you use the "cookie" option in the NET:HTTP parser? Is there a way to see what is being used in the browser and then use that in NET:HTTP cookie option?
Click to expand...

The Cookie option accept cookies in string format, try copy it from your browser

Strange When Scraping Google

scrapefun A-Parser Enterprise License
A-Parser Enterprise

Support Administrator
Staff Member A-Parser Enterprise

scrapefun A-Parser Enterprise License
A-Parser Enterprise

Forbidden Administrator
Staff Member A-Parser Enterprise

Share This Page

About A-Parser

Quick Navigation

Twitter

Contact Us

Useful Searches

Strange When Scraping Google

scrapefun A-Parser Enterprise License A-Parser Enterprise

Support Administrator Staff Member A-Parser Enterprise

scrapefun A-Parser Enterprise License A-Parser Enterprise

Forbidden Administrator Staff Member A-Parser Enterprise

Share This Page

Support Tickets

scrapefun A-Parser Enterprise License
A-Parser Enterprise

Support Administrator
Staff Member A-Parser Enterprise

scrapefun A-Parser Enterprise License
A-Parser Enterprise

Forbidden Administrator
Staff Member A-Parser Enterprise