Skip to main content

SE::Dogpile::Images - Dogpile image parser

Overview of Dogpile image parserโ€‹

img

Dogpile search results images parser. With the SE::Dogpile::Images parser, you can get databases of image links or images ready for further use. You can use queries in the same format as you enter them in the Dogpile search bar.

The A-Parser functionality allows you to save the parsing settings of the Dogpile parser for further use (presets), set a parsing schedule, and much more. You can use automatic query multiplication, substitution of subqueries from files, enumeration of alphanumeric combinations and lists to obtain the maximum possible number of results.

Saving results is possible in the form and structure that you need, thanks to the built-in powerful Template Toolkit that allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.

Data collectedโ€‹

  • Links to images
  • Links to page
  • Height and width
  • Links to thumbnails

img

Use casesโ€‹

  • Collecting images to fill your blogs
  • Collecting images to fill your websites
  • Collecting avatar databases

Queriesโ€‹

  • You must specify search phrases as queries, for example:
Remington  
Tiger
Romeo and Juliet
Quantum mechanics

Query substitutionsโ€‹

You can use built-in macros to multiply queries, for example, we want to get a very large database of forums, specify several main queries in different languages:

forum
ั„ะพั€ัƒะผ
foro
่ฎบๅ›

In the query format, we specify the enumeration of characters from a to zzzz, this method allows you to rotate the search results to the maximum and get many new unique results:

$query {az:a:zzzz}

This macro will create 475254 additional queries for each original search query.

Resultsโ€‹

  • As a result, a list of links by queries is displayed:
http://crossexamined.org/wp-content/uploads/2014/06/Quantum_Computer.jpg  
https://upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Hydrogen_Density_Plots.png/1200px-Hydrogen_Density_Plots.png
http://3.bp.blogspot.com/-7mo9xgi0zZ0/VDcYLKYsZmI/AAAAAAAABc8/toMaFUqtcEc/s1600/24854-quantum-mechanics.jpg
http://4.bp.blogspot.com/-FnufNdvAIAI/T6GAIsE9QrI/AAAAAAAADgs/ini4LJG_Nes/s1600/A+Mass+&+Energy.jpg
http://40.media.tumblr.com/tumblr_ma6rb5smWd1rx06nvo1_1280.jpg
https://media.buzzle.com/media/images-en/gallery/education/physics/1200-261760-basics-of-quantum-mechanics.jpg
https://wonderopolis.org/wp-content/uploads/2017/03/Quantum_Physicsdreamstime_xxl_60222747.jpg
https://cdn.wallpapersafari.com/20/6/FkmvcC.gif
https://media.buzzle.com/media/images-en/gallery/education/chemistry/1200-96168909-atoms-emit-light.jpg
http://www.therealityfiles.com/wp-content/uploads/2014/12/quantum_mechanics.jpg
http://i.dailymail.co.uk/i/pix/2010/03/18/article-1258932-014CFA7D000004B0-375_468x462.jpg
https://cdn.wallpapersafari.com/7/34/jXU8Ay.gif
http://mednorthwest.com/wp-content/uploads/2015/09/Quantum-entanglement-wave-particle.jpg
http://steve-patterson.com/wp-content/uploads/2015/02/QuantumPhysics2.jpg
http://cdn1.collective-evolution.com/assets/uploads/2016/07/QuantumPhysics-759x500.jpg

Result output optionsโ€‹

Result format:

[% FOREACH item IN serp;
tools.CSVline(query, item.link, item.width, item.height, item.page, item.thumb);
END;
%]

Result example:

cats,https://cdn2.theweek.co.uk/sites/theweek/files/2017/11/131117-wd-cats.jpg,1400,788,https://www.theweek.co.uk/94877/why-are-so-many-australian-towns-introducing-cat-curfews,https://tse3.mm.bing.net/th?id=OIP.iYyPimFLj1_wgKEsTsggQgHaEK&pid=Api
cats,http://mymodernmet.com/wp/wp-content/uploads/2017/03/gabrielius-khiterer-stray-cats-8.jpg,750,1028,https://mymodernmet.com/gabrielius-khiterer-stray-cat-photos/,https://tse2.mm.bing.net/th?id=OIP.ZjfS8JQc9sahsK0-w8dRFAHaKJ&pid=Api
cats,https://www.israelhayom.com/wp-content/uploads/2020/04/why-cats-are-best-pets-worshipped-animals-1559234295.jpg,2119,1415,https://www.israelhayom.com/2020/04/23/2-nyc-cats-test-positive-for-coronavirus-officials-recommend-pet-precautions/,https://tse1.mm.bing.net/th?id=OIP.U7274nc_llbuQTChXpKVNgHaE8&pid=Api
cats,http://fishsubsidy.org/wp-content/uploads/2020/01/abyssinian-cats.jpg,1204,1445,http://fishsubsidy.org/category/cat/cat-breeds/,https://tse3.mm.bing.net/th?id=OIP.uHEu4-5TLJ6SSgDree6ahQHaI4&pid=Api
cats,https://external-preview.redd.it/gxbKXOj-OF1_RSHa7Ncp8Gs_OFFP5i6V7SU5DPT2t1E.jpg?auto=webp&s=b6e85ba0f1517dc629d21208a7d9db992d550ba9,1920,2560,https://www.reddit.com/r/cats/comments/2k2pio/my_very_ugly_cat/,https://tse1.mm.bing.net/th?id=OIP.t2BxlpEwcGrXJJQSToWVBAHaJ4&pid=Api
cats,http://www.zastavki.com/pictures/originals/2013/Animals_Cats_Sleeping_gray_kitten_036760_.jpg,2560,1600,http://www.zastavki.com/eng/Animals/Cats/wallpaper-36760.htm,https://tse4.mm.bing.net/th?id=OIP.3c_ISLWidlMWXHfjqkpB2wHaEo&pid=Api
cats,https://d.ibtimes.co.uk/en/full/1457779/cats-dont-need-owners.jpg,720,1280,https://www.ibtimes.co.uk/cats-prefer-their-owners-other-people-dont-need-them-feel-safe-1518912,https://tse1.mm.bing.net/th?id=OIP.COdza3KGEWHT3uo9gJ5-0QCoEs&pid=Api
cats,https://img.webmd.com/dtmcms/live/webmd/consumer_assets/site_images/article_thumbnails/reference_guide/why_cats_sneeze_ref_guide/1800x1200_why_cats_sneeze_ref_guide.jpg,1800,1200,https://pets.webmd.com/cats/why-cats-sneeze,https://tse4.mm.bing.net/th?id=OIP.6C8jTceMZG78kseu8RUyfAHaE8&pid=Api
cats,http://mcdaniel.hu/wp-content/uploads/2015/01/6784063-cute-cats-hd.jpg,2560,1600,http://mcdaniel.hu/cat-adoption-101/,https://tse4.mm.bing.net/th?id=OIP.QdEkrZjd1c_VN_aUtleoFgHaEo&pid=Api

Output result in JSONโ€‹

Result format:

[%  data = {};
data.query = query;
data.images = [];
FOREACH item IN serp;
image = {};
image.width = item.width;
image.height = item.height;
image.link = item.link;
image.page = item.page;
image.thumb = item.thumb;
data.images.push(image);
END;
result = {};
result = data;
data.json
%]

Result example:

{
"images": [
{
"link": "https://viralcats.net/blog/wp-content/uploads/2017/12/Mean-looking-cat-Viral-Cats-03.jpg",
"width": "462",
"page": "https://viralcats.net/blog/2017/12/30/10-kitties-that-you-dont-want-to-mess-with/",
"thumb": "https://tse2.mm.bing.net/th?id=OIP.AdkhgipoWbJwiQBp9VIWpgAAAA&pid=Api",
"height": "722"
},
{
"link": "http://mymodernmet.com/wp/wp-content/uploads/2017/03/gabrielius-khiterer-stray-cats-8.jpg",
"width": "750",
"page": "https://mymodernmet.com/gabrielius-khiterer-stray-cat-photos/",
"thumb": "https://tse2.mm.bing.net/th?id=OIP.ZjfS8JQc9sahsK0-w8dRFAHaKJ&pid=Api",
"height": "1028"
},
{
"link": "http://fishsubsidy.org/wp-content/uploads/2020/01/abyssinian-cats.jpg",
"width": "1204",
"page": "http://fishsubsidy.org/category/cat/cat-breeds/",
"thumb": "https://tse3.mm.bing.net/th?id=OIP.uHEu4-5TLJ6SSgDree6ahQHaI4&pid=Api",
"height": "1445"
},

],
"query": "cats"
}

Saving in SQL formatโ€‹

Result format:

[%  FOREACH p1.serp;    "INSERT INTO serp VALUES('" _ query _ "', '"; link _ "', '";  page _ "', '";    thumb _ "')\n"; END  %]

Result example:

INSERT INTO serp VALUES('cats', 'https://viralcats.net/blog/wp-content/uploads/2017/12/Mean-looking-cat-Viral-Cats-03.jpg', 'https://viralcats.net/blog/2017/12/30/10-kitties-that-you-dont-want-to-mess-with/', 'https://tse2.mm.bing.net/th?id=OIP.AdkhgipoWbJwiQBp9VIWpgAAAA&pid=Api')
INSERT INTO serp VALUES('cats', 'http://mymodernmet.com/wp/wp-content/uploads/2017/03/gabrielius-khiterer-stray-cats-8.jpg', 'https://mymodernmet.com/gabrielius-khiterer-stray-cat-photos/', 'https://tse2.mm.bing.net/th?id=OIP.ZjfS8JQc9sahsK0-w8dRFAHaKJ&pid=Api')
INSERT INTO serp VALUES('cats', 'http://fishsubsidy.org/wp-content/uploads/2020/01/abyssinian-cats.jpg', 'http://fishsubsidy.org/category/cat/cat-breeds/', 'https://tse3.mm.bing.net/th?id=OIP.uHEu4-5TLJ6SSgDree6ahQHaI4&pid=Api')
INSERT INTO serp VALUES('cats', 'https://cdn2.theweek.co.uk/sites/theweek/files/2017/11/131117-wd-cats.jpg', 'https://www.theweek.co.uk/94877/why-are-so-many-australian-towns-introducing-cat-curfews', 'https://tse3.mm.bing.net/th?id=OIP.iYyPimFLj1_wgKEsTsggQgHaEK&pid=Api')
INSERT INTO serp VALUES('cats', 'https://www.israelhayom.com/wp-content/uploads/2020/04/why-cats-are-best-pets-worshipped-animals-1559234295.jpg', 'https://www.israelhayom.com/2020/04/23/2-nyc-cats-test-positive-for-coronavirus-officials-recommend-pet-precautions/', 'https://tse1.mm.bing.net/th?id=OIP.U7274nc_llbuQTChXpKVNgHaE8&pid=Api')
INSERT INTO serp VALUES('cats', 'https://s-i.huffpost.com/gen/964776/images/o-CATS-KILL-BILLIONS-facebook.jpg', 'https://www.huffingtonpost.com/2013/01/30/domestic-cats-kill-billions-mice-birds-annually-study_n_2575833.html', 'https://tse1.mm.bing.net/th?id=OIP.ETFxELWtgKQwMlcoccq-SAHaHa&pid=Api')

A-Parser allows you to use a chain of tasks, after the first one is completed, the second one will start, the queries for the second task will be links from the first one.

Download example

How to import an example into A-Parser

eJyNVktT2zAQ/iuMhkNoQ2IOvfjCBGimdCihEE4hnVHjtSuQJSPJAcbkv3dXNn6k
JvRmrfa9335ywRy3D/bKgAVnWbgoWOa/WcjOdJIJCXvnKU9gL9JPSmoegWFDlnFj
wZD+gt18DcNKNQy9rkWNCGKeS8eWyyFDh/hpp9qknBzvZ0ejKkp9ecPXMNd4GaOf
RjzF0yVPgawi7oBuR7F3NDgYuWfywKNIOKEVl2UESquJeqvEY0721hmhEtTHoxFg
p0anKHbgnZDw5S3DBdv3Z4Zucm//s7RhYcylhSGzmO6UYzLR9o1wYLjTZpZRTigv
mFYTKS9gDbJR8/5PciGxoXYSo9F5ZdivMvvHx6YusR1qDebJYA61F386mf1orCJ9
oROsPPqNdUuRCodne6pzRcMJUPgAkNV9u9QoSbWBOowzOdTBEToZqAgVm6lNskbU
qaIzma5wpVUskhnmb0QEb5q5miM+Z+pUp5kEKouVENs7a+Mxt3DdAGZiq6HQoU53
29WpD0h9qKA6ZE5rab/flIlnRiAev1C6Kba1nUPV2hWX8vb6opNdgy/yrBNYaaEY
6TpINEIK69oMi84GXYILw2/z+VVrb1DFQALP6AU740C5Q/eSQTj6dCyoAeNBppLX
+wyOk9dExAe++ag/RyW0EbQXxvCXanmouPJmlVun07Jb9cRQ/ge4H8abqHIUS97a
UloN2hdfwppLDxClFTT+cUOdn4n2cKHFA2XbkcqFK/WqKEo7eMy5ZJs2XzS770u2
43IrRypPD/eL6hsVNiNilNJtD2JULuU7QO/DcC89/A96doIwaPZO6Q9Ja5sN30Fa
G1T9NIucyXuWsGBW52ZFbkqiI+zTcKmdbDmssTceLH6Nl58P7u5Gg+OwC7n9HsxV
ICjNN8th85r0rW0PXWzRbdBHRvVm9rBw0N6/7ZXusFjwDhFtvxt+XB+xerCb0bev
O2webPo4JtjxJvVz5a7HIGg/BBTQzwB7fuSHVJJR/RdQ9D7qYYF2DVfhEc/39qo0
JphXOhjW+tU/2vwFUiDnrA==

Possible settingsโ€‹

ParameterDefault valueDescription
Pages count5Number of pages for parsing
Util::ReCaptcha2 presetdefaultDetermines whether to use Util::ReCaptcha2Util::ReCaptcha2 to bypass captchas
ReCaptcha2 retries3The number of attempts to send a response for reCAPTCHA specified number of times without changing the proxy