Formatting requests
Request format - allows adding substitutions and formatting the request to the desired form using templates, applies to each request.
Request formats
- Request format for the 1st scraper
- Request format for the 2nd scraper
- General request format
There are 2 ways to specify a template:
- General request format, which is processed first and supports substitutions
- Request format for each scraper - allows setting a specific format for individual scrapers
Let's analyze the example in the screenshot, assuming that we use a file with a list of domains like this:
google.com
a-parser.com
yandex.ru
The general request format is set as:
http://$query
Before each original request (domain), the string http://
will be added, the request will be transformed google.com
-> http://google.com
The request format for the 1st scraper remains unchanged, it will parse the request http://google.com
The request format for the 2nd scraper looks like this:
site:$query
The request for this scraper will be transformed: http://google.com
-> site:http://google.com
Templates in requests
The request format fully supports the Template Toolkit templating engine, the following variables are available:
$query
- the request after formatting through the general result format$query.num
- the sequential number of the request$query.lvl
- the nesting level of the request when using options Parse to level or Parse all results$query.orig
- the original request before formatting$query.first
- the first request when using options Parse to level or Parse all results$query.prev
- shows the request that was on the previous level, works for HTML::LinkExtractor, $tools.query.add and JS scrapers this.query.add- All variables created through Query Builder
Substitution macros
General request format supports the following macros:
Macro | Description | Examples |
---|---|---|
{az:START:END} | Substitution of an alphanumeric sequence. START indicates the beginning of the sequence, END indicates the end. The length of END must be greater than or equal to the length of START. Characters at the end of the sequence END must follow (in alphabetical order) the characters at the beginning of the sequence START. Any UTF-8 character sequences can be used | {az:a:z} - substitution of all characters from a to z (a, b, c, ..., x, z). {az:aaa:zzz} - substitution of all characters from aaa to zzz (aaa, aab, aac, ..., zzx, zzz). {az:a:zz} - substitution of all characters from a to zz (a, b, c, ... aa, ab, ..., zx, zz). {az:00:99} - substitution of all numbers from 00 to 99 (00, 01, 02, ..., 98, 99). {az:а:яяя} - substitution of all Cyrillic characters from а to яяя (а, б, ... аа, аб, ... яяю, яяя) |
{each:WORD1,WORD2,...} | Substitution of the specified words WORD1, WORD2, etc., length is not limited | {each:green,blue,red,black} - substitution of the words green, blue, red, black. {each:,buy,sell} - substitution of an empty word, then buy and sell |
{subs:NAME} | Substitution of additional words from files in the queries/subs/ folder. Instead of NAME, you need to specify the file name, without the .txt extension | {subs:zones} - substitution of all lines from the file queries/subs/zones.txt |
{num:START:END} | The macro iterates over numbers in the specified range. START indicates the beginning of the interval, END indicates the end. Fractional numbers are supported. | {num:1:1000} - substitution of all numbers from 1 to 1000 (1, 2, 3 ..., 999, 1000) |
{num:START:END:STEP} | The macro iterates over numbers in the specified range, with the specified step. START indicates the beginning of the interval, END indicates the end, STEP indicates the step. Fractional numbers are supported. | {num:0:1000:10} - substitution of all numbers from 0 to 1000 with a step of 10 (0, 10, 20 ..., 990, 1000) |
{num:END:START} | The macro iterates over numbers in the specified range in reverse order. END indicates the end of the interval, START indicates the beginning. Fractional numbers are supported. | {num:1000:1} - substitution of all numbers from 1000 to 1 (1000, 999, 998, ..., 2, 1) |
{num:END:START:STEP} | The macro iterates over numbers in the specified range in reverse order, with the specified step. END indicates the end of the interval, START indicates the beginning, STEP indicates the step. Fractional numbers are supported. | {num:1000:1:10} - substitution of all numbers from 1000 to 1 with a step of 10 (1000, 990, 980, ..., 10, 1) |
⏩ Video: Substitution Macros
This video covers:
- the
{num}
macro with examples of page navigation and coordinate iteration in the scraper Maps::Google - the
{az}
macro with an example of parsing with inurl: to increase the number of queries and, consequently, results - the
{each}
macro with an example of parsing suggestions to generate phrases
Combining Substitution Macros
Substitution macros can be combined. Complex example:
$query site:{subs:zones} {az:aa:zz}
Suppose one of the scraping requests was viagra, and the file queries/subs/zones.txt
contains the following list of zones: com, net, org, then the following set of combinations will be sent for scraping:
viagra site:com ab
...
viagra site:net jj
...
viagra site:eek.rg zz
The total number of requests will correspond to the product of possible combinations:
1 request (viagra) x 3 zones ({subs:zones}) x 676 character variations ({az:aa:zz}) = 2028 requests