Features of working with templates in A-Parser
Testing templates
For debugging and testing templates in A-Parser, there is a special tool: Template Testing
Method .format
for arrays
In A-Parser, most of the results are presented in the form of arrays with nested elements. Technically speaking, the results are presented in the form of an array of hashes, where each hash has fixed keys. Let's consider this on the example of the SE::Google scraper, it contains an array
$serp
with elements $link
, $anchor
, and $snippet
(and others):
"serp" : [
{
"link" : "http://www.speedtest.net/",
"anchor" : "Speedtest.net by Ookla - The Global Broadband Speed <b>Test</b>",
"snippet" : "<b>Test</b> your Internet connection bandwidth to locations around the world with this <br>interactive broadband speed <b>test</b> from Ookla."
},
{
"link" : "http://www.speakeasy.net/speedtest/",
"anchor" : "Speakeasy Speed <b>Test</b>",
"snippet" : "Speakeasy Speed <b>Test</b> - Broadband Speed <b>Test</b>. Go to MegaPath Speed <b>Test</b> ... <br>02:38:36 PM Your IP: The Speakeasy Speed <b>Test</b> requires Flash v7 or higher."
},
{
"link" : "http://en.wikipedia.org/wiki/Test_cricket",
"anchor" : "<b>Test</b> cricket - Wikipedia, the free encyclopedia",
"snippet" : "<b>Test</b> cricket is the longest form of the sport of cricket. <b>Test</b> matches are played <br>between national representative teams with "<b>Test</b> status", as determined by the ..."
}
]
For convenient traversal and output of data from such an array, the .format
method was created, which allows you to combine all array elements according to a certain format, for example, all links with a line break:
$serp.format('$link\n')
Which will save each link on a new line:
http://www.speedtest.net/
http://www.speakeasy.net/speedtest/
http://en.wikipedia.org/wiki/Test_cricket
Output of snippets:
$serp.format('$snippet\n')
<b>Test</b> your Internet connection bandwidth to locations around the world with this <br>interactive broadband speed <b>test</b> from Ookla.
Speakeasy Speed <b>Test</b> - Broadband Speed <b>Test</b>. Go to MegaPath Speed <b>Test</b> ...<br>02:38:36 PM Your IP: The Speakeasy Speed <b>Test</b> requires Flash v7 or higher.
<b>Test</b> cricket is the longest form of the sport of cricket. <b>Test</b> matches are played <br>between national representative teams with "<b>Test</b> status", as determined by the ...
Links, anchors, and snippets at the same time:
$serp.format('Link: $link, Anchor: $anchor, Snippet: $snippet\n')
Link: http://www.speedtest.net/, Anchor: Speedtest.net by Ookla - The Global Broadband Speed <b>Test</b>, Snippet: <b>Test</b> your Internet connection bandwidth to locations around the world with this <br>interactive broadband speed <b>test</b> from Ookla.Link: http://www.speedtest.net/, Anchor: Speedtest.net by Ookla - The Global Broadband Speed <b>Test</b>, Snippet: <b>Test</b> your Internet connection bandwidth to locations around the world with this <br>interactive broadband speed <b>test</b> from Ookla.
Link: http://www.speakeasy.net/speedtest/, Anchor: Speakeasy Speed <b>Test</b>, Snippet: Speakeasy Speed <b>Test</b> - Broadband Speed <b>Test</b>. Go to MegaPath Speed <b>Test</b> ... <br>02:38:36 PM Your IP: The Speakeasy Speed <b>Test</b> requires Flash v7 or higher.Link: http://www.speakeasy.net/speedtest/, Anchor: Speakeasy Speed <b>Test</b>, Snippet: Speakeasy Speed <b>Test</b> - Broadband Speed <b>Test</b>. Go to MegaPath Speed <b>Test</b> ... <br>02:38:36 PM Your IP: The Speakeasy Speed <b>Test</b> requires Flash v7 or higher.
Link: http://en.wikipedia.org/wiki/Test_cricket, Anchor: <b>Test</b> cricket - Wikipedia, the free encyclopedia, Snippet: <b>Test</b> cricket is the longest form of the sport of cricket. <b>Test</b> matches are played <br>between national representative teams with "<b>Test</b> status", as determined by the ...Link: http://en.wikipedia.org/wiki/Test_cricket, Anchor: <b>Test</b> cricket - Wikipedia, the free encyclopedia, Snippet: <b>Test</b> cricket is the longest form of the sport of cricket. <b>Test</b> matches are played <br>between national representative teams with "<b>Test</b> status", as determined by the ...
In the format, you can also use the original query (or other available variables), which allows you to set the correspondence of the query to each element of the array:
$serp.format('$query: $link\n')
test: http://www.speedtest.net/
test: http://www.speakeasy.net/speedtest/
test: http://en.wikipedia.org/wiki/Test_cricket
Method .json
for objects
As you know, all data in A-Parser is represented as variables. There is a method of serialization (conversion to the String type) of such data into JSON format: .json
. For example:
$results.json
Flag of a static template in the result file name
The isStaticTemplate()
flag allows you to make a dynamic template in the Result file name format static.
The principle of operation: when using this flag in the Result file name format, the template will be executed once at the start of the task and thus will be considered static. This will allow you to name files more flexibly and at the same time maintain the ability to get links to them through the API method getTaskResultsFile.
Example of use:
[% isStaticTemplate(); tools.js.eval('Date.now()') %]
Available variables
When forming the name of the result file
Variable interpolation
By default, templates are written between the tags [%
and %]
, everything outside the tags is plain text that will be passed to the result as is.
In A-Parser, variable interpolation is additionally included, which allows you to refer to variables in the text through the $
symbol.
In addition to this, \n
is also interpolated as an explicit line break.
Example:
Всего результатов по запросу $query: $totalcount\n
The values of the corresponding variables will be substituted for $query
and $totalcount
, and \n
will be replaced with a line break.
Equivalent notation without using interpolation:
Всего результатов по запросу [% query %]: [% totalcount; "\n" %]
Please note that in Template Toolkit templates, variables are written without the $
prefix.
Usage examples of templates
Formatting queries
In this example, each domain from the Alexa top500.txt file will be added with a search operator site: and substitutions from the words.txt file will be added with a space.
Formatting results
In this example, a query, the number of results in the output, and the number of related keywords will be displayed. Also, a list of collected anchors will be displayed.
Templates for filtering results
In this example, only queries for which less than 5 results were collected will be displayed in the output.
Templates for using the "Use regex" option
In this example, the scraper will collect sentences that contain the word passed as the second argument of the query. The algorithm works as follows: the query is divided by the Query Builder into a link and a word specified by the delimiter; the scraper goes to the link, selects the text; the word from the query is substituted into the regular expression, and sentences are collected using it.
Download example
How to import an example into A-Parser
eJyNVE1T2zAQ/StUEwYo1EkoDK0vncA003YCoSScHLejxmtXjWwZSYZkQv57d2XH
diiHXmTp6b3d1X54zSw3C3OrwYA1zA/WLHd75rMIYl5Iy05YzrUBTdcB+zK9Hvn+
FJb289JqPrdKI6Pmrpld5YDqeWGsSu/AlCZ0ufEDZlFqEKEvC+kmgSUKDoNZ0Tvr
997R5zQOgx/em0+zWTab6fA42N97KECvvAWs9vZ37t4GuA+Pj1hlalr6F0nttUK4
1nyFoPve8JQwY7XIElNT6Y0VyMJNGNb4UOmUU1Y6ed+rVF7swMODTglgQAdHjakJ
f4SpQkksJDTwEE+V907ELdDt1tKRZ5eULR5FwgqVcVn6pbCaWO4zgclAfaaQS3kR
YIZapQi5rJbgahtzwDruTNkunPZ7qWF+zKWBE2Yw1CHHQKKXN8KC5ljjcU7xIL5m
KhtIOYJHkA3N2b8shIywTwYxir5Wwtcp439sbOrntV09gn7SGENtxZ0ux9eNKlIj
lWyTIUUqLJ7NlSoyKlcPwQVAXufshmip0lC7qSxX3nEUcsgiZDYlG+QNtPMMNzBG
FXpOpsskn2wnoGyKSS4FVcQATlGZEPZMJEWF2Uqwranfao8tBwjOVRaLZIzJ0CKC
bTMU2RRnd5xdqTSXQDnKCimxxAbumlYbmKqkdGge+1J85VzsTL1VSppvk/LZuRYY
8DkFmGJV2l4rk3Mu5f3dqH3DmvbEw29rc+N3u7rwnsRC5BAJ7imddOnUdVPfp/X9
B7dGtJ6dun3PrdwhLc7Zhduft3DesvDxZ0sctfa/WqSL5/8hMXqfhURhtTD7VKnq
p1j/Otev/hr99QZ79I+5LdlUBOIihtU02IDM72/+Ar1q4iA=
Templates in scraper settings
The screenshot shows an example of substituting a random user-agent.
Setting up pre-installed macros
In A-Parser, you can configure template macros and pre-installed variables that will be available globally for all templates. You can specify global macros in Settings -> Additional settings:
By default, the $datefile
object is already defined there, which is used to format time for the result file name.