Skip to main content

Presentation and formatting of results

Available formats for saving results

To format results in A-Parser, the Template Toolkit template engine is used, which makes it easy to save parsing results in various formats:

  • In text files as a list: one result per line, separated by a delimiter, in any format
  • In CSV files with the ability to further import into Excel, Google Docs, etc.
  • In XML, JSON, and other data storage formats
  • In HTML by generating pages on the fly
  • In SQL dump format for direct import into a database or directly writing to an SQLite database
  • In binary format for saving images (jpg, png, gif, ...), documents (pdf, docx, ...), executable files and archives (exe, dmg, zip, ...) and any other data types

Editing result format

Result format allows you to format results to the desired view using templates and is applied to each query-result combination. Screenshot of result formats

  • Common result format is set in the Result format field
  • Result format for each parser can be set separately in the parser settings in Result format

A-Parser supports working with several parsers in one task, in the general format of results, it is necessary to indicate from which parser to output the result:

  • $p1 - results from the first parser (SE::GoogleSE::Google in the screenshot), $p2 - results from the second parser (SE::BingSE::Bing in the screenshot)
  • The parser number is displayed to the left of the parser selection field
  • $p1.preset and $p2.preset imply that it is necessary to take the result format value from the settings of the corresponding parsers
  • In this example, $p1.preset can be replaced with $p1.serp.format('$link\n') which will have the same effect, while the result format from the settings will not be used

Result format can be specified in a convenient multiline editor by clicking on the corresponding icon in the editing field: Screenshot in the result format editing field

The following variables are available in the general result format:

  • $query - query after formatting
  • $query.* - all variables related to the query, described in the article Templates in queries
  • $p1, $p2, ... - variables for accessing parsing results for each parser separately (View possible results for each parser)
  • $p1.query, $p2.query, ... - queries after formatting with the query format specified in the settings of each parser

Prepend and append text

For each result file, a separate Prepend/Append text is specified:

  • To form the header of a CSV file
  • For initial and final tags of an XML file
  • For the header, body, and footer of HTML files
  • For any other options

To activate this feature, click the button More options button at the bottom of the Task Editor Fields for prepend and append text

The Prepend and Append text supports the use of the Template Toolkit template engine, the following variables are available:

  • $query - query after formatting
  • $query.* - all variables related to the query, described in the article Templates in queries
info

Important! These variables are only available when saving each query to a separate file or when using these same variables in the Result file name format.

Result file name format

A-Parser allows using templates in result file names, which allows automatically creating files and folders based on the current date, the order number of the request, the request itself, and in any other format. File name fields The following variables are supported in the File name field:

  • All variables available for the Common result format
  • $queriesfile - the path and file name of the queries file, if the queries are specified through the form, it will contain queries_from_text.txt
  • $datefile - the date plugin object of the Template Toolkit templating engine, configured to the date format %b-%d_%H-%M-%S, when formatted, it outputs the current time and date in the form of May-08_20-08-38, the format can be changed in the Additional settings

By default, the file name is created based on the date and time at the start of the task.

Complex example

reports/$queriesfile/${query}.txt
  • A reports folder will be created.
  • A subfolder with the name of the query file will be created.
  • In the subfolder, as many files will be created as there are queries used in the task, and the name of the file will be the query itself with the .txt extension.
tip

The $query variable is written in the format ${query} to prevent interpolation of the .txt extension as part of the variable, see the documentation for the Template Toolkit templating engine for more information.

⏩ Video. Naming result files

In this video, we will give several examples of naming the result file:

  1. Numbering the result file according to the queries.
  2. Numbering the result file + part of the query name.
  3. Naming the result file according to the query, if the query is a link.

Viewing available results

Each scraper has its own set of results, and you can view the list of available results by hovering over the scraper with the pointer, and the pop-up tooltip will display a list of simple results and arrays, with a list of nested elements: List of available results in the pop-up tooltip

Yellow highlights the results that are common to all scrapers:
  • $query - the query passed to the scraper after formatting
  • $query.orig - the original query (as it was in the file or in the query input field)
  • $query.first - the first query when using nested parsing options (Parse all results or Parse to level)
  • $info.success - information about the success of parsing this query
  • $info.retries - the number of attempts used for this query
  • $info.stats - the scraper's work statistics for this query
  • $pages.$i.data - an array of unprocessed server responses for the possibility of extracting additional information independently
Green highlights the results available only for the SE::GoogleSE::Google scraper:
  • $totalcount - the number of search results
  • $misspell - whether there is a typo in the query
  • $detected_geo - the detected geo
  • $ads with elements $link, $anchor, $visiblelink, $snippet, $position, and $page - an array with a list of ads
  • $related.$i.key - an array with a list of related keywords
  • $rich.$i.name - an array with extended snippets
  • $serp with elements $link, $anchor, $snippet, $amp - an array with the main search engine results
note

Note that for arrays, the variable $i is explicitly specified, which means that there are several elements, and they can be accessed by index (position number) or iterated over each element in a loop.

tip

The $pages.$i.data result will automatically be changed to $data for those scrapers that do not "go through pages" within one query. For example, like DeepL::TranslatorDeepL::Translator.

Results representation

A-Parser was created for parsing information of any kind, for this purpose, 2 types of results were introduced:

  • Simple results(Flat)
  • Result arrays(Array)

Let's consider each type using the example of the SE::GoogleSE::Google scraper, a screenshot of the search results: Google search results screenshot

Simple results

Simple results - when one query corresponds to one result, examples:

  • The number of results for the query ($totalcount)
  • Whether the query is a typo ($misspell, not shown in the screenshot)

Other examples:

  • The value of Alexa Rank ($rank) in the Rank::AlexaRank::Alexa scraper
  • The value of the translated text ($translated) in the DeepL::TranslatorDeepL::Translator scraper
  • The number of referring domains ($domains), the trust value ($trustflow), backlinks ($backlinks), etc. in the Rank::MajesticSEORank::MajesticSEO scraper

Single results are stored in regular variables (prefix $ + name in Latin letters)

Result arrays

Result arrays - when one query corresponds to a list of results, each element of the list, in turn, can contain several nested elements. Let's consider the Google search results as an example - it is represented in the scraper by the $serp array, for clarity, we will use a table and write down the first 5 search results:

Link($link)Anchor($anchor)Snippet($snippet)
http://www.speedtest.net/Speedtest.net by Ookla - The Global Broadband Speed TestTest your Internet connection bandwidth to locations around the world with this interactive broadband speed test from Ookla.
http://en.wikipedia.org/wiki/Test_cricketTest cricket - Wikipedia, the free encyclopediaTest cricket is the longest form of the sport of cricket. Test matches are played between national representative teams with "Test status", as determined by the ...
http://www.speakeasy.net/speedtest/Speakeasy Speed TestSaturday 03-May 2014, 11:04:29 AM Your IP: The Speakeasy Speed Test requires Flash v7 or higher. Please update your browser. See Pricing Or Call Today
http://www.humanmetrics.com/cgi-win/jtypes2.aspPersonality test based on C. Jung and I. Briggs Myers type theoryHumanmetrics Jung Typology Test™ instrument uses methodology, questionnaire, scoring and software that are proprietary to Humanmetrics, and shall not be ...
http://test-ipv6.com/Test your IPv6.This will test your browser and connection for IPv6 readiness, as well as show you your current IPV4 and IPv6 address. ... Test your IPv6 connectivity. JavaScript ...

Each search result is recorded in an array with 3 nested elements - link($link), anchor($anchor), snippet($snippet)

Another example is a list of related keywords, which is stored in the $related array:

Keyword($key)
test wwe
depression test
test my speed
wonderlic test
test personality
act test
jiggle test
bipolar test

As can be seen in this array, there is only one nested element - keyword($key)

The numbering of array elements starts from 0, an example of accessing individual array elements:

  • $serp.0.link - the first link from the search results
  • $serp.3.anchor - the fourth anchor from the search results
  • $related.0.key - the first related keyword

More details about formatting simple results and arrays will be described below.

Formatting principles

After the scraper has collected the data in simple results and arrays, it is necessary to display (save to a file) it in the required format. For convenience and functionality, A-Parser uses the Template Toolkit Template Engine. Let's consider frequently used constructions, for this we will use the Template testing tool. We will select the project for the SE::GoogleSE::Google scraper:

Screenshot of Template testing

The screenshot shows 3 fields:

  • JSON - internal representation of data in the scraper
  • Template - the template according to which the result is formatted
  • Result - the data converted according to the specified template, in the form in which the result will be written to the file

By changing the template, we can change the appearance of the result, let's consider the following template:

Screenshot of Template testing example of changing the appearance of the result

Text in the Template field:

Отчет по запросу: $query  
Конкуренция: $totalcount
Список ссылок, анкоров и сниппетов:
$serp.format('$link $anchor\n$snippet\n\n')

The main rules are:

  • Ordinary text is output to the result as is, without changes
  • To output simple results, it is necessary to output a variable containing the required result with the prefix $ in the right place
  • To format arrays, the format method is used, which will be described below
  • \n is responsible for line break

Formatting arrays

Formatting arrays, let's consider the construction:

$serp.format('$link $anchor\n$snippet\n\n')

This entry means that for the $serp array, it is necessary to call the format method with the parameter '$link $anchor\n$snippet\n\n'. The format method concatenates all the elements of the array into a string according to the template specified in the parameter, the template itself means: for each element of the $serp array, output the link and anchor separated by a space, then output the snippet on a new line, after which there are two more line breaks, resulting in an empty line between the results.

Using the template engine

Outputting variables

To use the template engine, you need to insert [% %] tags, and inside the tags enter the logic that needs to be executed.

Outputting CMS using the template engine Result of outputting CMS using the template engine

Looping through an array

To output array elements, use the FOREACH construction:

[% 
FOREACH i IN p1.list;
i.cms _ "\n";
END
%]
tip

More information and examples on the template engine in Features of working with templates in A-Parser.

Examples

Outputting competition

Outputting the competition for a query (the number of results for a query) for all search engine scrapers (SE::GoogleSE::Google, SE::YandexSE::Yandex...).

Result format:

$query: $totalcount\n

Result:

test: 3910000000
viagra: 278000000
окна пвх: 3220000
...

Outputting links from search engine results.

Result format:

$serp.format('$link\n')

Result:

http://www.speedtest.net/
http://www.speakeasy.net/speedtest/
http://en.wikipedia.org/wiki/Test_cricket
http://www.humanmetrics.com/cgi-win/jtypes2.asp
http://html5test.com/
http://test-ipv6.com/
...

Parsing suggestions

Outputting search engine suggestions.

Result format:

$results.format('$suggest\n')

Result:

тестовый сервер танки онлайн
тесты гиа по русскому языку
тесто для блинов рецепт
тестикула
тесто для пиццы на молоке

Outputting data about the response

In Net::HTTPNet::HTTP and scrapers based on it, additional output is available:

  • $proxy - the proxy on which the request was executed

  • $headers - response headers

  • $code - response code

  • $reason - response status

example of variable output

Output of variable values in JSON

$results.json

The .json method allows you to output data in JSON format: example of outputting variable values in JSON

Output of all request redirects

For this task, the variable $response is available, which allows you to get any request variables, including all previous redirects.

Result format:

$response.Redirects.format('$URI\n--> ')$response.URI

Result: example of getting redirects

Output in JSON using a template engine to record the date

The example shows the output of the Net::WhoisNet::Whois scraper results in JSON format.

example of output in JSON using a template engine to record the date

As a result, there will be a domain that was checked, the date at the time of the check, and the check result. As can be seen in the Result format, we get the date using the Template-Toolkit template engine.

Result format:

{    
"domain": "$query",
"date": "[% USE d = date(format = '%d.%m.20%y', locale = 'C');d.format() %]",
"expire": "$p1.expire_date",
},

Example result:

[{
"domain": "a-parser.com",
"date": "05.05.2021",
"expire": "25.02.2022",
},
]
Download example

How to import an example into A-Parser

eJxtVG1v2jAQ/ivWCUQrZaxM2pdM+0BRkTYx6ErRPgQ0efWFeXXszHYYVZT/3nMS
EtruQyTf23P33EtK8Nw9uluLDr2DOCkhr98QwxJ9HP/4baRj79jNkWe5QjaBCHJu
HdrgnZw5kUFgygvlYbeLgFDo6ebGZjyglVvNGNuCMBmXegsxvQd/C7RPW4hONu6x
sSRDtlnfMME+s6C8SGsYkkZDMR5m4w9Xw6dRxJR54FQUqWejy09i3LhdXLLhrkfF
Yy5tizvIJ+NG/tkkI6eKPugKXvMD3hsqOJUKe/WcpCXPkAyDEBmsXbqxP3py5UJI
L43mqmEdOtR3YqMl0aV4bcg3MJfo5tZkfa66HaeOJW17gCCKOvZ7EwNxypXDCByV
OucUKl5bpEfLvbGrPNRD+hKMniq1wAOq3q3Gvy6kEjTOaUpBX9rA/7us3mBUHb3z
VAe0/yzVALG3BYHUwvXqWx8kzMLsibj4RbSVzKQn2c1MocOuXJHyETHvWrYMLcuM
xS5LA9zmpvXNUQtyTPqJTfNWV1Y7eEXkxWBeKh+MTuV+RRSsFHjyLPQ93clKz0y4
gsBMF0rRYBze9Qsyde0ggtC18E3wrE4R2Lf3EoE3Rrmva9KF+7KSFvBjKDCjXp5n
bSFp69XmbnFugbOlqrMnJ/F9c3Ku3tLAkNZ3b2ixiFq16865+weU50cdlxWN64+7
bZwCdK2MgDrkaBYQT6pnHsF5pA==

Checking a site for presence in Google News by keyword

example of checking a site for presence in Google News by keyword

Result format:

[%
linksToOneString = p1.serp.format('$link. ');
matches = linksToOneString.match('.+?(' _ p1.query.domain _ ').+?');

IF matches.0;
p1.query.orig _';yes' _ "\n";
ELSE;
p1.query.orig _';no' _ "\n";
END
%]

Example result:

парсер гугл|a-parser.com;no
парсер гугл|forbes.ru;yes
Download example

How to import an example into A-Parser

eJylVVFv2jAQ/iuR1YpWo1EQ7UuqaaIMpk6stKV9AoRccqQeju3aCQVR/vvOTkjS
ruNlEoq4u+++O9+dz1uSUrM0txoMpIaE4y1R7j8JyagXhj+kjDl4Z173GeZLL5IJ
ZcLDX2EQ8Go8WNNEcSBNoqg2oC3PuOaOhggWNOMpaW5JulGA7HIFWrPIGlmEsqIx
zGUmEENWlGeIae3+DccwqkIS8WrIATRnYmkUumCQGn9wwCXmNfr7x0Psz3Wozg4m
omtQTkU8O4xXWq43GlLNwNQ8W0EQkN102iTYLKyr6UudUNu08fFEuNM+yKGAETqK
2PvqqZZvK+YvHO6kcWQxvtc4vZwIVMyfwSDqo6PvTCcN/8u3k4Y3sywvGeiNX8zB
DAnQ5lg+CTvzJmSCFuLM132viOQHqCi5pGYIbVxuwDTeufQGo95nQCE/4G6+T8Tx
lJTFGNEVPEgsxoK54dvXCKUbmtj6HUU0BWvdF+TUT9d28mgUsZRJQXleUTvKVZUf
BXvJx00i1maFbelrmaAqBUfgUt13Y0yOnEyQInO+d7kPCReUG2gSg6n2KSYSlZZU
Z2hgKWiaSj1UNh1Ub4kUHc4HsAJe+Tv6q4zxCK9dZ4FO14Xj55DhXxy78nT1UDiG
rxpzKFmcdDX8VXlFciBjPHj0hMfmLGEpyqbrrnBIAlQuAVRZshtbskRqKMMUzEV0
XEEKhJ35qmMdVaneHcMtKiMzPbfUeY2b+ztk3PSNFGe2IQZwKeUFIW8WJG1f9i75
IJPprmpyLQYq51IsWDwsruV+HDLxgGtzKLrSLj5bJpFxjk02cF8NW8cUTbVCdd6P
zl0XwlZyvyQxScnNz1F+cqUZ5nxhE0ywMfWoBeWccv54P6hbSDWgKEyy4Ly9cN/A
fs/zb8tpLnKN54S2E9rV//bTGz3L97o/lwneuf/iwvv2hBsAt55NO4VYYh9sN939
CUkvf0vssoO1oiKCKL8SO9ej4oEp36tt/ZkJtzucy9/mNgfZqlsI6rB9BofOPil/
AGn6WSM=

Output of timestamp value in date format

Sometimes there is no regular date in the results, but there is a timestamp value as in the Social::Instagram::TagSocial::Instagram::Tag scraper. This value can be represented in date format using Template-Toolkit template engine.

example of outputting timestamp value in date format

Result format:

[% 
USE date;
p1.query.orig _ ": total posts - " _ p1.postscount _ "\nPosts:\n";

FOREACH i IN p1.posts;
d = date.format(i.time, format => '%d.%m.20%y');
i.link _ " - " _ d _ ":\n";
i.text _ "\n";
END
%]

Example result:

sport: total posts - 96500663
Posts:
https://www.instagram.com/p/COfJHshAkeD/ - 05.05.2021:
Quelques exemples de notre nouvelle campagne de communication personnalisable avec le nom des clubs 😀

Vous préférez quel visuel : 1, 2, 3, 4, 5 ? 🤔

#clubnormand #tennis #padel #beachtennis #tenniscourt #padelcourt #beachtenniscourt #lnt #LigueNormandieTennis #🎾 #sport #normandie #normandietourisme
https://www.instagram.com/p/COfJG7olavg/ - 05.05.2021:
💥 Sau màn lật đổ “Bà già” thành công, Nửa xanh thành Milan chính thức vượt qua Nửa đỏ về số lần lên đỉnh nước Ý nhiều nhất lịch sử.
-----------------------------
➖ Website: https://webthethao247.com/
➖ https://g.page/webthethao247?share
#wtt247 #webthethao247 #thethao #sport #bongda #SerieA #InterMilan #Juventus #ACMilan
https://www.instagram.com/p/COfJG1Hg7ax/ - 05.05.2021:
Which Skill was better 1 or 2? 🤔👇
Follow @ftb4ll for more 💥
Follow @ftb4ll for more 💥
Follow @ftb4ll for more 💥
________________________________________
Leave a Like 👍🏽
Subscribe for more 🔔
Leave your thoughts in the Comments  💬
________________________________________
❌Ignore the Tags ❌
#football #soccer #fussball #futbol #fifa #championsleague #bundesliga #ucl #footballmemes #goal #transfer #sports #penalty #ultimateteam #pacybits #fut #ultras #laliga #freekick #referee #sport #calcio #messi #ronaldo #skills #premierleague #foul #footballseason
https://www.instagram.com/p/COfIlXqhfAa/ - 05.05.2021:
Be Fuckin’ Ready 🤣🤣🤣

Get ready to fly!!!! 🏐🏐🏐🏐

Follow - @crackonkings

#beachball #nalin&kane #trance #music #90s #onyerhead #festival #party #afterparty #love #summer #uk #happy #sesh #crackon #football #sport #festivaloutfit #festivalfashion #sun #dj #dancing #club #festivalgirl #house #techno #rave
...
Download example

How to import an example into A-Parser

eJx1VNuO2jAQ/RXLAm1XotFupb6k2kosBZWKAuXyRFDlYoNcHDtrOxSE+PfOODfY
dvOUOZ6Zc+Zin6lnbu+mVjjhHY1XZ5qFfxrTudlIpuJ4qJ1nO8vSOF6wHXlPFjIV
AKUZ8YZw5gURR7CUoB2aMeuExUyrNxKAExdblitPO2fqT5kALnMQ1kqOGSQHe2ts
yjwoCW70wFSObqt2opfzfiD9lOhEZ4/RSy7sKTJW7shPktAYRHmmSGacdyA2oQCD
W7A3Jtc+uCWJniISw09CQ67BZNbv9r4SSYbjOgJPPCdPgTIqZL2TkYcWdEhhkqfP
5K7No3YafXhon+7uIYbAJyMl9T6wVTp4IbHiLLy8ONaaAtwff0l0e00v63WHFj1w
g0AFLWihsmJE9eGcHcTCYNtkGEIVA9aYpdi4FsrH06qE+8gfMQPjXHppNFMFA06u
YV1q+RIarw34YqelcANrUoBQdgmeKnUr2go2hRR5iP1RxNB4y5QTHepA6oCBEP76
RHphmTd2kqEewM/U6K5SI3EQqnEL+Z9zqTisWXcLQcMy8P8uk39yXOryrqlgAf9Y
0FBnCdbz5HsTxc3I7KBy/gvqVjKVHmzXw50C9AHAvRBZ3bMx9iw1VtQ0ZeaSHS5d
JjRuezOybtZAN2XcjOUW3Bi9lbtJeYMqz1wv4GZPdM/gzcS6dK4UjMWJWbMeXVeO
AY1G4OvgXqDA0qubS70xyn2bF1IzK2H9PqLAFDp5zVqm3DCllrPR9QltVgoMlxmL
eTewqDsDKwRlXNb1g1K/UOe3npX4fIEZ/XbTIgALQnfAoDMOBkDjx8tfNDez1Q==

Other examples

Examples of frequently used templates.