Strange When Scraping Google

Discussion in 'A-Parser Support Forum' started by scrapefun, May 19, 2016.

  1. scrapefun

    scrapefun A-Parser Enterprise License
    A-Parser Enterprise

    Joined:
    Feb 24, 2015
    Messages:
    184
    Likes Received:
    34
    I know Google is hard to scrape without getting proxies banned but I'm seeing something strange.

    When using the task tester for the Google Parser, proxies are blocked/shown captcha image. But if I manually take the same proxy and use in browser it works with no blocks.

    Why would this be? Maybe Google can tell a full browser is being used and allows the proxy to work?

    Doesn't matter if I enable or disable https, use sessions or not, etc. Same result.

    Here is a short video I made showing the above steps in action:
    http://screencast.com/t/qtc8WL42GlQ5
     
  2. Support

    Support Administrator
    Staff Member A-Parser Enterprise

    Joined:
    Mar 16, 2012
    Messages:
    4,557
    Likes Received:
    2,167
    Please note between your browser and the parser has a few differences:
    - In the parser is used http, and https in the browser
    - In the parser domain constantly .co.uk, the browser changes .com, .co.in
    - A set of parameters in the link differs between the browser and parser
    - May use different user agents
    - The same can still be the cause in cookies
    All this may well affect the issuance captcha.
     
    scrapefun likes this.
  3. scrapefun

    scrapefun A-Parser Enterprise License
    A-Parser Enterprise

    Joined:
    Feb 24, 2015
    Messages:
    184
    Likes Received:
    34
    Thanks for the insight.

    I tried testing again but this time used NET:HTTP parser so I could be sure to use the same user agent. Also used HTTPS, same Google domain, and same parameters in query url for both.

    The result was the same (worked in browser not in software):
    http://screencast.com/t/dGdZpHqpW

    So it would seem Google can tell that a browser isn't being used. Maybe something to do with javascript being used in the browser?

    How do you use the "cookie" option in the NET:HTTP parser? Is there a way to see what is being used in the browser and then use that in NET:HTTP cookie option? Just because I'm obsessive I'd like to test using the same cookie as well if possible just to satisfy my curiosity :)
     
  4. Forbidden

    Forbidden Administrator
    Staff Member A-Parser Enterprise

    Joined:
    Mar 9, 2013
    Messages:
    3,337
    Likes Received:
    1,795
    No JS, your broswer just know your old session(Cookie)

    The Cookie option accept cookies in string format, try copy it from your browser
     
    Support likes this.

Share This Page