I need to extract some additional information from Google Images results and am not sure how to go about it. On the Google image results page each image generates a url like this: href="http://www.google.com/imgres?imgurl...BQ&tbm=isch&ved=0CDQQMygCMAI&biw=1366&bih=631" I need to extract the values for these parameters: imgurl= imgrefurl= tbnid= And finally, is there a way to extract the filetype of the image into a variable as well (jpg, png, etc)? Something like $filetype? So for the final result I would like stored on each line: $query;$loop.count;$imgurl;$imgrefurl;$tbnid.$filetype\n
I know Forbidden is super busy so I would be open to hiring someone to get this solution. If anyone is interested just send me a PM.
Very interesting solution: Use SE::Google::Images + Raw data results for generate queries and get raw html Use complex regex to get all data Use power of Result format for generate proper result Spoiler: Preset to import Code: eyJwcmVzZXQiOiJ0b3BpYy0xNjA5OiBjdXN0b20gZ29vZ2xlIGltYWdlcyBwYXJz ZXIiLCJ2YWx1ZSI6eyJwcmVzZXQiOiJ0b3BpYy0xNjA5OiBjdXN0b20gZ29vZ2xl IGltYWdlcyBwYXJzZXIiLCJwYXJzZXJzIjpbWyJTRTo6R29vZ2xlOjpJbWFnZXMi LCJkZWZhdWx0Iix7InR5cGUiOiJvdmVycmlkZSIsImlkIjoicmF3ZGF0YSIsInZh bHVlIjp0cnVlfSx7InR5cGUiOiJjdXN0b21SZXN1bHQiLCJyZXN1bHQiOlsicGFn ZXMiLCJkYXRhIl0sInJlZ2V4IjoiaW1ndXJsPShbXiZdKj8oPzpcXC4oanBlP2d8 cG5nfGdpZikpPykmYW1wO2ltZ3JlZnVybD0oW14mXSspJi4qP3RibmlkPShbXjpd Kyk6IiwicmVnZXhUeXBlIjoiaWciLCJyZXN1bHRUeXBlIjoiYXJyYXkiLCJhcnJh eU5hbWUiOiJpbWdzIiwicmVzdWx0cyI6WyJsaW5rIiwidHlwZSIsInJlZiIsInRi bmlkIl19LHsidHlwZSI6Im92ZXJyaWRlIiwiaWQiOiJmb3JtYXRyZXN1bHQiLCJ2 YWx1ZSI6IlslIEZPUkVBQ0ggaW1ncyAtJV1cbiRxdWVyeTskbG9vcC5jb3VudDsk bGluazskcmVmOyR7dGJuaWR9LlslIHR5cGUgPT0gJ25vbmUnID8gJ2RlZmF1bHQu anBnJyA6IHR5cGUgJV0gXG5bJSBFTkQgJV0ifV1dLCJyZXN1bHRzRm9ybWF0Ijoi JHAxLnByZXNldCIsInJlc3VsdHNTYXZlVG8iOiJmaWxlIiwicmVzdWx0c0ZpbGVO YW1lIjoiJGRhdGVmaWxlLmZvcm1hdCgpLnR4dCIsImFkZGl0aW9uYWxGb3JtYXRz IjpbXSwicmVzdWx0c1VuaXF1ZSI6Im5vIiwicXVlcnlGb3JtYXQiOlsiJHF1ZXJ5 Il0sInVuaXF1ZVF1ZXJpZXMiOmZhbHNlLCJzYXZlRmFpbGVkUXVlcmllcyI6ZmFs c2UsIml0ZXJhdG9yT3B0aW9ucyI6eyJvbkFsbExldmVscyI6ZmFsc2UsInF1ZXJ5 QnVpbGRlcnNBZnRlckl0ZXJhdG9yIjpmYWxzZX0sInJlc3VsdHNPcHRpb25zIjp7 Im92ZXJ3cml0ZSI6ZmFsc2V9LCJkb0xvZyI6Im5vIiwia2VlcFVuaXF1ZSI6Ik5v IiwibW9yZU9wdGlvbnMiOmZhbHNlLCJyZXN1bHRzUHJlcGVuZCI6IiIsInJlc3Vs dHNBcHBlbmQiOiIiLCJxdWVyeUJ1aWxkZXJzIjpbXSwicmVzdWx0c0J1aWxkZXJz IjpbXSwiY29uZmlnT3ZlcnJpZGVzIjpbXX19
Thanks! This works great. Is it possible to use the result of one parser to form the queries for another parser? I saw in the help files that it was not possible when the page was posted but wondered if it was possible yet? Basically, I want to use the net::http parser to download the actual image from Google images. I got it working as a stand alone task but I would like to be able to use the "$link" result value from the Google image parser as the query for the net:http parser. Thanks again for your help!
Ok. What I am doing is creating an additional result file when scraping Google Images that just contains the image URLs and then I use those as the $query for the Net::HTTP parser in a separate task but with this method I can't match up the image to the original keyword query. I want to name the images with the query from the Google Images task. How do I match up the image to the correct query the Google Image parser task?
As the request file, select obtained in the previous task file. The result is img folder with a pictures, named by keyword and number.