Customize Google Images Scraper

Discussion in 'A-Parser Support Forum' started by smilesmile, Oct 13, 2018.

  1. smilesmile

    smilesmile New Member

    Joined:
    Oct 13, 2018
    Messages:
    2
    Likes Received:
    0
    I need to extract some additional information from Google Images results and am not sure how to go about it.

    On the Google image results page each image generates a url like this:

    href="http://www.google.com/imgres?imgurl...BQ&tbm=isch&ved=0CDQQMygCMAI&biw=1366&bih=631"

    I need to extract the values for these parameters:

    imgurl=
    imgrefurl=
    tbnid=

    And finally, is there a way to extract the filetype of the image into a variable as well (jpg, png, etc)? Something like $filetype?

    So for the final result I would like stored on each line:
    $query;$loop.count;$imgurl;$imgrefurl;$tbnid.$filetype\n
     
  2. Support

    Support Administrator
    Staff Member A-Parser Enterprise

    Joined:
    Mar 16, 2012
    Messages:
    4,557
    Likes Received:
    2,167
    The source code for the pages is contained in the $pages array.
    After analyzing it, you can see that each picture is represented by a JSON object, which has all the data that you need. Therefore, the task is reduced to scraping these objects and outputting the necessary data.
    [​IMG]
    Code:
    eJxtVE1v2zAM/SsG0aJJEXjoYRf3C2nXbC26uEvaU5QVQswYTmXJk+QshZH/Pkp2
    7CZrDo5JkY+Pj7QqsNy8mSeNBq2BaFZB4d8hggSXvBQWBlBwbVC74xlM76Lou1Kp
    wCi6z3mKhgLa0Arse4GUvCiNVfkETY2g65doRlhNCrcc5u4kxQ0lXCTZuncdBSsj
    eY6XDGa/GcxPGfSvg4XgxpBLp685Wr47ueoxVoWn14xt+xeMfSGEK2gQn2saJm2L
    Nx6uNX8np/8fUynyZRZz0wa6PmFV89vO5617pHTOnTCz42AUT+6Gtz8Clxncj4Pi
    LPQg50wyuzJKBpeBVUqY0Gv3MI3HPRcQetw+hQX0+1Oifg9eg5PzE3oKpYpwoUpp
    W5dDClW5b+sDO0sObOswGTDiAlTpbvwtOJ537U35Gp8V9bHMBHbuEVmNHkdEEt1p
    uPQ99/qh3bgx8iTJbKYkF7UYTqpOoBeZUUeULxXFuuYyNCOtcnJZ9AC+452QMzjy
    tluD0uf+qnMgWnJhcACGqI44EUkOT0hMza3SceH4kL8CJYdCPOIaRRfm8W/KTCS0
    v8MlJd03iZ+HxP9hbNv2PpZao/6riUOL4q2b+GeXlahHle7EeEMsWnnGzpMrjS1i
    A9IUoq+xQJlQZDedYdG59hjvTWDfuVBymaUxcdVZgrvIUj7TJx/LW5UXAl0LshSC
    JmBw0m3C0DSKO6MjeJh860vsXRZ+7R+mNdVCZ7RpXx3BnET7WLWBXHAhXiaPH0+g
    2x6/OcbBLmglU0XL4u4ov00RKO2bHQBuCi4TJH3OtvTNtjdWe69Vn91bUbWlOa3M
    Ux3sOvXOAZBkhibj4P4BnL/CrQ==
     

Share This Page