I think I'm having a character encoding problem but I can't tell if it is with how A-parser is setup or possibly something on my server. I have a custom Net::HTTP parser that queries Google and then saves the results page as a html file. It works great but for some reason a few queries will not save as a file. They appear to work fine in the parser test and the queries aren't failing they simple aren't being saved. All the queries that are failing are non-english words/phrases. I've include the parser code below but the one way I was able to get it to work is change this line for the results file name: serp_raw/[% IF p1.info.success == 1 %][% USE Math; "test_4"_ Math.int(query.num / 2500) _"/"_ query _".html" %][% END %] To this: serp_raw/[% IF p1.info.success == 1 %][% USE Math; "test_4"_ Math.int(query.num / 2500) _"/test.html" %][% END %] With the updated line I can perform one query at a time and the file will be generated but of course the file naming is no longer dynamic. And I can then manually re-name the files with the correct query name with no problems. Here is the parser code that includes the queries failing: Code: eyJwcmVzZXQiOiJUZXN0IC0gUkFXIEhUTUwiLCJ2YWx1ZSI6eyJwcmVzZXQiOiJU ZXN0IC0gUkFXIEhUTUwiLCJwYXJzZXJzIjpbWyJOZXQ6OkhUVFAiLCJkZWZhdWx0 Iix7InR5cGUiOiJvdmVycmlkZSIsImlkIjoidXNlci1hZ2VudCIsInZhbHVlIjoi TW96aWxsYS81LjAgKFdpbmRvd3MgTlQgNi4xOyBXT1c2NCkgQXBwbGVXZWJLaXQv NTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzQ3LjAuMjUyNi4xMTEg U2FmYXJpLzUzNy4zNiJ9LHsidHlwZSI6Im92ZXJyaWRlIiwiaWQiOiJmb3JtYXRy ZXN1bHQiLCJ2YWx1ZSI6IlslIElGIGluZm8uc3VjY2VzcyA9PSAxICVdJHBhZ2Vz LmZvcm1hdCgnJGRhdGFcXG4nKVslIEVORCAlXSJ9LHsidHlwZSI6Im92ZXJyaWRl IiwiaWQiOiJwcm94eXJldHJpZXMiLCJ2YWx1ZSI6IjIwMCJ9LHsidHlwZSI6Im92 ZXJyaWRlIiwiaWQiOiJ1c2Vwcm94eSIsInZhbHVlIjp0cnVlfSx7InR5cGUiOiJv dmVycmlkZSIsImlkIjoiZ29vZENvZGUiLCJ2YWx1ZSI6WzIwMF19LHsidHlwZSI6 Im92ZXJyaWRlIiwiaWQiOiJwcm94eWJhbm5lZGNsZWFudXAiLCJ2YWx1ZSI6IjAi fSx7InR5cGUiOiJvdmVycmlkZSIsImlkIjoicXVlcnlmb3JtYXQiLCJ2YWx1ZSI6 Imh0dHBzOi8vd3d3Lmdvb2dsZS5jb20vc2VhcmNoP3E9JHF1ZXJ5JnB3cz0wJnV1 bGU9dytDQUlRSUNJTlZXNXBkR1ZrSUZOMFlYUmxjdyJ9LHsidHlwZSI6Im92ZXJy aWRlIiwiaWQiOiJyZXF1ZXN0ZGVsYXkiLCJ2YWx1ZSI6IjAifSx7InR5cGUiOiJv dmVycmlkZSIsImlkIjoidGltZW91dCIsInZhbHVlIjoiMzAifV1dLCJyZXN1bHRz Rm9ybWF0IjoiJHAxLnByZXNldCIsInJlc3VsdHNTYXZlVG8iOiJmaWxlIiwicmVz dWx0c0ZpbGVOYW1lIjoic2VycF9yYXcvWyUgSUYgcDEuaW5mby5zdWNjZXNzID09 IDEgJV1bJSBVU0UgTWF0aDsgXCJ0ZXN0XzRcIl8gTWF0aC5pbnQocXVlcnkubnVt IC8gMjUwMCkgX1wiL1wiXyBxdWVyeSBfXCIuaHRtbFwiICVdWyUgRU5EICVdIiwi YWRkaXRpb25hbEZvcm1hdHMiOltbImZhaWxlZC9mYWlsZWQudHh0IiwiWyUgSUYg cDEuaW5mby5zdWNjZXNzID09IDAgJV0kcXVlcnlcXG5bJSBFTkQgJV0iXV0sInJl c3VsdHNVbmlxdWUiOiJubyIsInF1ZXJpZXNGcm9tIjoidGV4dCIsInF1ZXJ5Rm9y bWF0IjpbIiRxdWVyeSJdLCJ1bmlxdWVRdWVyaWVzIjp0cnVlLCJzYXZlRmFpbGVk UXVlcmllcyI6ZmFsc2UsIml0ZXJhdG9yT3B0aW9ucyI6eyJvbkFsbExldmVscyI6 ZmFsc2UsInF1ZXJ5QnVpbGRlcnNBZnRlckl0ZXJhdG9yIjpmYWxzZSwicXVlcnlC dWlsZGVyc09uQWxsTGV2ZWxzIjpmYWxzZX0sInJlc3VsdHNPcHRpb25zIjp7Im92 ZXJ3cml0ZSI6ZmFsc2V9LCJkb0xvZyI6ImRiIiwia2VlcFVuaXF1ZSI6Ik5vIiwi bW9yZU9wdGlvbnMiOmZhbHNlLCJyZXN1bHRzUHJlcGVuZCI6IiIsInJlc3VsdHNB cHBlbmQiOiIiLCJxdWVyeUJ1aWxkZXJzIjpbXSwicmVzdWx0c0J1aWxkZXJzIjpb XSwiY29uZmlnT3ZlcnJpZGVzIjpbXSwicXVlcmllcyI6Ilx1YmU0NVx1YmM0NW1v bnN0ZXJcdWI0ZTNcdWFlMzBcblx1MDQzNFx1MDQ0ZFx1MDQ0MyBcdTA0NDJcdTA0 MzhcdTA0M2FcdTA0M2UgXHUwNDNlXHUwNDQyXHUwNDM3XHUwNDRiXHUwNDMyXHUw NDRiXG5nXHUyNjZkIG1ham9yXG5kciBwZXJvIHZyXHUwMTdlb2dpXHUwMTA3In19
I'm also having trouble with queries like: The "&" "+" and "#" characters don't seem to be passed/encoded properly. Also, when saving the file if a query has a "%" it won't be used in the filename even thought that is an acceptable character for Windows filenames. Even if I encode the query myself it's still not working properly but I need to be able to keep the query in orginal form in my query list and not encoded but just wanted to see what would happen if tested already encoded.
I'm working on this is issue, new version will be released soon you have to apply escape filter: this will be fixed also
Thanks for the help! For the escape filter, I would need to create a filter for each character I am having issues with or is there a way to specify multiple characters in a single filter? Also, I'm not clear on where I put this in my task settings. The screenshot providing the example is for Google parser but I am using NET::Http parser? Looking forward to the update for the other issues. Fantastic support as always!
haha...couldn't be much easier than that Will you post here when the update is ready or should I just check the RU forum for the latest updates? Thanks for the help!
I tested with the latest update and the file naming issues I was having seem to be fixed and most of the characters are passed fine after using the escape filter but I'm still having problems with some words. Mainly those containing "+", "&", "<", and ">" characters. Here are some examples with the original phrase on the left of the "=" and what is returned in the Google result page I'm downloading on the right: Granted some of these are pretty much nonsense for testing purposes but I need to be able to properly submit these characters. Could very well be I'm doing something wrong on my end of course
It isn't proper symbols for HTML, you can't use < >(and several other symbols) directly in html. This is because you will get < > & etc... This called "HTML-entities" Exactly same you will get from google in browser:
Thanks for the explanation on that. I was just checking in my text editor instead of browser so didn't see those rendered correctly. This explains what I was seeing with all the characters except the "+". When I view files for those queries in either my text editor or browser the "+" are not there. It's like they have been left off. I see this for all the queries containing a "+" . They don't seem to be there. Thanks for all the help and patience.