Skip to main content

Template Toolkit Tools (tools)

In the Template Toolkit, there is a global variable $tools, which contains a set of tools available in any template and inside JS scrapers. There is also a $tools.error variable, which contains a description of errors, if any occur during the operation of all tools.

Adding queries $tools.query.*

This tool allows you to add queries to the existing ones during the task execution, forming them based on the already scraped results. It can be used as an alternative to the Parse to level function in scrapers where it is not implemented. There are 2 methods:

  • [% tools.query.add(query, maxLevel) %] - adds a single query
  • [% tools.query.addAll(array, item, maxLevel) %] - adds an array of queries

The maxLevel parameter indicates up to which level to add queries, and is optional: if it is omitted, then in fact the scraper will continue to add new queries as long as they exist. It is also recommended to enable the Unique queries option to avoid looping and excessive scraper work.

It is possible to set an arbitrary level for subqueries. This can be used to distribute logic, i.e., when each level is a separate functionality.

example:

  • [% tools.query.add({query => query, lvl => 1}) %] - adds a query to a specific level.

example for JS:

this.query.add({
query: "some query",
lvl: 1,
})
Example

Result of the preset work on the screenshot:

парсер:
parser
what is parsing in programming
parsing in compiler
compiler and parser development
what is syntax analysis
difference between lexical analysis and syntax analysis
syntax analyzer
parser programming language
parser:
parser definition
xml parser
parser generator
parser swtor
parser c++
ffxiv parser
html parser
parser java
what is parsing in programming:
parse wikipedia
parser compiler
what is a parser
parsing programming languages
definition of parser
parsing c++
parser define
parsing java
html parser:
online html parser
html parser php
html parser java
...

Parsing JSON structures $tools.parseJSON()

This tool allows you to deserialize JSON format data into variables (object) accessible in the template. Example of use:

[% tools.parseJSON(data) %]

After deserialization, the keys from the obtained object can be accessed as normal variables and arrays. If a string with invalid JSON is specified as an argument, the scraper will record an error in $tools.error.

Example

Output in CSV $tools.CSVline

This tool automatically converts values to CSV format and adds a line break, thus in the result format it is enough to list the required variables, and the output will be a valid CSV file, ready for import into Google Docs / Excel / etc.

Example of use:

[% tools.CSVline(query, p1.serp.0.link, p2.title) %]

Video with the use of $tools.CSVline():

Working with SQLite DB $tools.sqlite.*

This tool allows you to easily and fully work with SQLite databases. There are three methods:

  • $tools.sqlite.get() - a method that allows you to get single information from the DB using SELECT, for example:
[% res = tools.sqlite.get('results/test.sqlite', 'SELECT COUNT(*) AS count FROM test') %]
  • $tools.sqlite.run() - a method that allows you to perform operations with the DB (INSERT, DROP, etc.), for example:
[% res = tools.sqlite.run('results/test.sqlite', 'INSERT INTO test VALUES(?)', 'test') %]
  • $tools.sqlite.all() - a method that allows you to output all data from a table, for example:
[% res = tools.sqlite.get('results/test.sqlite', 'SELECT * FROM test') %]

Substituting user-agent $tools.ua.*

This tool is designed to substitute the user-agent in scrapers that use it (for example, Net::HTTPNet::HTTP). There are two methods:

  • $tools.ua.list() - contains a complete list of available user-agents.
  • $tools.ua.random() - outputs a random one from the available user-agents.

Example of use:

Example
tip

The list of all user-agents is stored in the files/tools/user-agents.txt file, which can be edited if necessary.

note

When using this tool for the User agent parameter in scrapers, it must be explicitly specified:

[% tools.ua.random() %]

JS support in tools $tools.js.*

This tool allows you to add your own JS functions and use them directly in the template engine. It also supports the use of Node.js modules. Functions are added in Tools -> JavaScript Editor

Working with base64 $tools.base64.*

This tool allows you to work with base64 directly in the scraper. This tool has 2 methods:

  • $tools.base64.encode() - encodes text to base64
  • $tools.base64.decode() - decodes base64 string to text

Usage example:

Example

Data reference $tools.data.*

This tool is essentially an object that contains a large amount of preset information - languages, regions, domains for search engines, etc. The full list of elements (may change in the future):

"YandexWordStatRegions", "TopDomains", "CountryCodes", "YahooLocalDomains", "GoogleDomains", "BingTranslatorLangs", "Top1000Words", "GoogleLangs", "GoogleInterfaceLangs", "EnglishMonths", "GoogleTrendsCountries"

Each of these elements represents an array or hash of data, you can view the contents by outputting the data, for example, in JSON:

[% tools.data.GoogleDomains.json() %]

In-memory data storage $tools.memory.*

A simple key/value storage in memory, shared across all tasks, API requests, etc., resets when the scraper is restarted. There are three methods:

  • [% tools.memory.set(key, value) %] - sets the value value for the key key
  • [% tools.memory.get(key) %] - returns the value corresponding to the key key
  • [% tools.memory.delete(key) %] - removes the record from memory by the key key

Getting information about the A-Parser version $tools.aparser.version()

This tool allows you to get information about the version of A-Parser and display it in the result.

Usage example:

[% tools.aparser.version() %]

Getting the task ID and number of threads $tools.task.*

This tool allows you to get information about the task id and show the number of threads. There are two methods:

  • [% tools.task.id %] - returns the task id
  • [% tools.task.threadsCount %] - returns the number of threads used in the task