Features of Template Operation in A-Parser
Testing Templates
For debugging and testing templates in A-Parser, there is a special tool: Template Testing
The .format
Method for Arrays
In A-Parser, most results are presented as arrays with nested elements. Speaking in technical terms - the results are presented as an array of hashes, where each hash has fixed keys. Let's consider this using the SE::Google scraper as an example, in the results it contains an array $serp
with elements $link
, $anchor
, and $snippet
(among others):
"serp" : [
{
"link" : "http://www.speedtest.net/",
"anchor" : "Speedtest.net by Ookla - The Global Broadband Speed <b>Test</b>",
"snippet" : "<b>Test</b> your Internet connection bandwidth to locations around the world with this <br>interactive broadband speed <b>test</b> from Ookla."
},
{
"link" : "http://www.speakeasy.net/speedtest/",
"anchor" : "Speakeasy Speed <b>Test</b>",
"snippet" : "Speakeasy Speed <b>Test</b> - Broadband Speed <b>Test</b>. Go to MegaPath Speed <b>Test</b> ... <br>02:38:36 PM Your IP: The Speakeasy Speed <b>Test</b> requires Flash v7 or higher."
},
{
"link" : "http://en.wikipedia.org/wiki/Test_cricket",
"anchor" : "<b>Test</b> cricket - Wikipedia, the free encyclopedia",
"snippet" : "<b>Test</b> cricket is the longest form of the sport of cricket. <b>Test</b> matches are played <br>between national representative teams with "<b>Test</b> status", as determined by the ..."
}
]
For convenient traversal and output of data from such an array, the .format
method was created which allows combining all elements of the array according to a certain format, for example, all links separated by a newline:
$serp.format('$link\n')
Which as a result will save each link on a new line:
http://www.speedtest.net/
http://www.speakeasy.net/speedtest/
http://en.wikipedia.org/wiki/Test_cricket
Output of snippets:
$serp.format('$snippet\n')
<b>Test</b> your Internet connection bandwidth to locations around the world with this <br>interactive broadband speed <b>test</b> from Ookla.
Speakeasy Speed <b>Test</b> - Broadband Speed <b>Test</b>. Go to MegaPath Speed <b>Test</b> ...<br>02:38:36 PM Your IP: The Speakeasy Speed <b>Test</b> requires Flash v7 or higher.
<b>Test</b> cricket is the longest form of the sport of cricket. <b>Test</b> matches are played <br>between national representative teams with "<b>Test</b> status", as determined by the ...
Links, anchors, and snippets simultaneously:
$serp.format('Link: $link, Anchor: $anchor, Snippet: $snippet\n')
Link: http://www.speedtest.net/, Anchor: Speedtest.net by Ookla - The Global Broadband Speed <b>Test</b>, Snippet: <b>Test</b> your Internet connection bandwidth to locations around the world with this <br>interactive broadband speed <b>test</b> from Ookla.Link: http://www.speedtest.net/, Anchor: Speedtest.net by Ookla - The Global Broadband Speed <b>Test</b>, Snippet: <b>Test</b> your Internet connection bandwidth to locations around the world with this <br>interactive broadband speed <b>test</b> from Ookla.
Link: http://www.speakeasy.net/speedtest/, Anchor: Speakeasy Speed <b>Test</b>, Snippet: Speakeasy Speed <b>Test</b> - Broadband Speed <b>Test</b>. Go to MegaPath Speed <b>Test</b> ... <br>02:38:36 PM Your IP: The Speakeasy Speed <b>Test</b> requires Flash v7 or higher.Link: http://www.speakeasy.net/speedtest/, Anchor: Speakeasy Speed <b>Test</b>, Snippet: Speakeasy Speed <b>Test</b> - Broadband Speed <b>Test</b>. Go to MegaPath Speed <b>Test</b> ... <br>02:38:36 PM Your IP: The Speakeasy Speed <b>Test</b> requires Flash v7 or higher.
Link: http://en.wikipedia.org/wiki/Test_cricket, Anchor: <b>Test</b> cricket - Wikipedia, the free encyclopedia, Snippet: <b>Test</b> cricket is the longest form of the sport of cricket. <b>Test</b> matches are played <br>between national representative teams with "<b>Test</b> status", as determined by the ...Link: http://en.wikipedia.org/wiki/Test_cricket, Anchor: <b>Test</b> cricket - Wikipedia, the free encyclopedia, Snippet: <b>Test</b> cricket is the longest form of the sport of cricket. <b>Test</b> matches are played <br>between national representative teams with "<b>Test</b> status", as determined by the ...
In the format, you can also use the original query (or other available variables), which allows you to set the correspondence of the query - each element of the array:
$serp.format('$query: $link\n')
test: http://www.speedtest.net/
test: http://www.speakeasy.net/speedtest/
test: http://en.wikipedia.org/wiki/Test_cricket
The .json
Method for Objects
As is known, all data in A-Parser is represented as variables. There is a serialization method (conversion to String type) of such data into JSON format: .json
. For example:
$results.json
Static Template Flag in Result File Name
The isStaticTemplate()
flag allows you to make a dynamic template in the Result File Name static.
How it works: when using this flag in the Result File Name, the template will be executed once at the start of the task and thus will be considered static. This will allow more flexibility in naming files while maintaining the ability to get links to them through the API method getTaskResultsFile.
Example of use:
[% isStaticTemplate(); tools.js.eval('Date.now()') %]
Available Variables
When forming the result file name
Variable Interpolation
By default, templates are written between the tags [%
and %]
, everything outside the tags is plain text, which will be passed to the result as is.
In A-Parser, variable interpolation is additionally included, which allows accessing variables in the text through the $
symbol.
In addition, \n
is interpolated as an explicit newline.
Example:
Всего результатов по запросу $query: $totalcount\n
The values of the corresponding variables will be substituted for $query
and $totalcount
, and \n
will be replaced with a newline.
Equivalent writing without using interpolation:
Всего результатов по запросу [% query %]: [% totalcount; "\n" %]
Note that in Template Toolkit templates, variables are written without the $
prefix.
Template Usage Examples
Formatting Queries
In this example, to each domain from the file Alexa top500.txt, the search operator site:
will be added, and through a space, substitutions from the file words.txt will be added.
Formatting Results
In this example, the query, the number of results in the output, and the number of related keywords will be displayed. Also, a list of collected anchors will be shown.
Templates when filtering results
To be able to set a template, you should select Custom Template from the dropdown list.
In this example, only those queries for which less than 5 results have been collected will be displayed as a result.
Templates when using the Use regex option
In this example, the scraper will collect sentences that contain the word passed as the second argument of the query. The algorithm works as follows: the query is split by the Query Constructor into a link and a word according to the specified delimiter; the scraper follows the link, selects the text; the word from the query is substituted into the regular expression and sentences are collected with it.
Download example
How to import an example into A-Parser
eJyNVE1T2zAQ/StUEwYo1EkoDK0vncA003YCoSScHLejxmtXjWwZSYZkQv57d2XH
diiHXmTp6b3d1X54zSw3C3OrwYA1zA/WLHd75rMIYl5Iy05YzrUBTdcB+zK9Hvn+
FJb289JqPrdKI6Pmrpld5YDqeWGsSu/AlCZ0ufEDZlFqEKEvC+kmgSUKDoNZ0Tvr
997R5zQOgx/em0+zWTab6fA42N97KECvvAWs9vZ37t4GuA+Pj1hlalr6F0nttUK4
1nyFoPve8JQwY7XIElNT6Y0VyMJNGNb4UOmUU1Y6ed+rVF7swMODTglgQAdHjakJ
f4SpQkksJDTwEE+V907ELdDt1tKRZ5eULR5FwgqVcVn6pbCaWO4zgclAfaaQS3kR
YIZapQi5rJbgahtzwDruTNkunPZ7qWF+zKWBE2Yw1CHHQKKXN8KC5ljjcU7xIL5m
KhtIOYJHkA3N2b8shIywTwYxir5Wwtcp439sbOrntV09gn7SGENtxZ0ux9eNKlIj
lWyTIUUqLJ7NlSoyKlcPwQVAXufshmip0lC7qSxX3nEUcsgiZDYlG+QNtPMMNzBG
FXpOpsskn2wnoGyKSS4FVcQATlGZEPZMJEWF2Uqwranfao8tBwjOVRaLZIzJ0CKC
bTMU2RRnd5xdqTSXQDnKCimxxAbumlYbmKqkdGge+1J85VzsTL1VSppvk/LZuRYY
8DkFmGJV2l4rk3Mu5f3dqH3DmvbEw29rc+N3u7rwnsRC5BAJ7imddOnUdVPfp/X9
B7dGtJ6dun3PrdwhLc7Zhduft3DesvDxZ0sctfa/WqSL5/8hMXqfhURhtTD7VKnq
p1j/Otev/hr99QZ79I+5LdlUBOIihtU02IDM72/+Ar1q4iA=
Templates in scraper settings
Example of substituting a random user-agent
Setting up predefined macros
In A-Parser, you can configure template macros and predefined variables that will be globally available for all templates, you can specify global macros in Settings -> Additional settings.
By default, it already contains the definition of the $datefile
object, which is used for formatting time for the result file name.
Adding and using a macro
This example shows the setting of a global variable. This can be useful, for example, if you need to use the same cookies in several Instagram scrapers.
Setting example:
Pay attention to the syntax of the closing bracket of the template -%]
. This is necessary to remove the newline character, otherwise, when using any template, an empty line will be added at the beginning.
Usage example: