Presentation and Formatting of Results
Available Formats for Saving Results
For formatting results in A-Parser, Template Toolkit templating is used, which allows you to easily save parsing results in various formats:
- In text files as a list: one result per line, separated by a delimiter, in an arbitrary format
- In
CSV
files with the possibility of further import into Excel, Google Docs, etc. - In
XML
,JSON
, and other data storage formats - In
HTML
"on the fly" generating pages - In
SQL
dump format for direct import into a database or directly writing to a SQLite database - In binary format for saving images (
jpg, png, gif
, ...), documents (pdf, docx
, ...), executable files and archives (exe, dmg, zip
, ...) and any other types of data
Editing the Result Format
Result format - allows you to format the results to the desired form using templates, is used for each query-results combination.
- General result format is set in the
Result format
field - Result format for each scraper separately can be set in the scraper settings in
Result format
A-Parser supports working with multiple scrapers in one task, in the general result format it is necessary to indicate from which scraper to display the result:
$p1
- results from the first scraper (SE::Google in the screenshot),$p2
- results from the second scraper (SE::Bing in the screenshot)- The ordinal number of the scraper is displayed to the left of the scraper selection field
$p1.preset
and$p2.preset
implies that it is necessary to take the value of the result format from the settings of the corresponding scrapers- In this example,
$p1.preset
can be replaced with$p1.serp.format('$link\n')
which will have the same effect, while the result format from the settings will no longer be used
Result format can be specified in a convenient multi-line editor by clicking on the corresponding icon in the editing field:
The following variables are available in the general result format:
$query
- the query after formatting$query.*
- all variables related to the query, described in the article Templates in queries$p1, $p2, ...
- variables for accessing parsing results for each scraper separately (Viewing possible results for each scraper)$p1.query, $p2.query, ...
- queries after formatting taking into account the query format specified in the settings of each scraper
Prepend and Append Text
For each result file, separate Prepend/Append text is specified:
- For forming the header of a CSV file
- For initial and final tags of an XML file
- For the header, heading, and footer of HTML files
- For any other applications
To activate this feature, click on the More options button at the bottom of the Task Editor
The initial and final text supports the use of the Template Toolkit templating engine, available variables:
$query
- the query after formatting$query.*
- all variables related to the query, described in the article Templates in queries
Important! These variables are only available when saving each query in a separate file or when using the same variables in the Result file name format.
Result file name format
A-Parser allows you to use templates in the names of the resulting files as well, which allows you to automatically create files and folders based on the current date, by the query's serial number, by the query itself, and in any other format.
The following variables are supported in the File name field:
- All variables available for the General result format
$queriesfile
- the path and name of the file with queries, if queries are specified through the form then it will contain queries_from_text.txt$datefile
- object of the date plugin of the Template Toolkit templating engine, configured to the date format%b-%d_%H-%M-%S
, when formatting it outputs the current time and date as May-08_20-08-38, the format can be changed in Additional settings
By default, the file name is created based on the date and time at the start of the task
Complex example
reports/$queriesfile/${query}.txt
- A folder named reports will be created
- A subfolder with the name of the query file will be created
- In the subfolder, as many files will be created as there are queries used in the task, the query itself will be used as the file name with the
.txt
extension
The $query
variable is written in the format ${query}
to prevent the interpolation of the .txt
extension as part of the variable, more details in the documentation on the Template Toolkit templating engine
⏩ Video. Naming result files
This video presents several examples of naming the result file:
- Numbering the result file according to the queries.
- Numbering the result file + part of the query name.
- Naming the result file by the query, if the query is a link.
Viewing available results
Each scraper has its own set of results, you can view the list of available results by hovering over the scraper, a tooltip will display a list of simple results and arrays, with a list of nested elements:
Yellow highlights results common to all scrapers:$query
- the query passed to the scraper after formatting$query.orig
- the original query (as it was in the file or in the query input field)$query.first
- the first query when using nested parsing options (Parse all results or Parse to level)$info.success
- information about the success of parsing this query$info.retries
- the number of attempts used for this query$info.stats
- statistics of the scraper's work for this query$pages.$i.data
- an array with raw server responses for the possibility of extracting additional information independently
$totalcount
- the number of search results$ads
with elements$link
,$anchor
,$visiblelink
,$snippet
,$position
, and$page
- an array with a list of ads$related.$i.key
- an array with a list of related keywords$serp
with elements$link
,$anchor
,$snippet
,$cache
- an array with the main search engine results
Please note that for arrays, the variable $i
is explicitly specified, indicating that there are multiple elements and they can be accessed by index (position number) or iterated over each element in a loop.
The result $pages.$i.data
will automatically be changed to $data
for those scrapers that do not "navigate through pages" within a single request. For example, like DeepL::Translator.
Presentation of Results
A-Parser was created for scraping information of all kinds, for this purpose two types of results were introduced:
- Simple Results (Flat)
- Arrays of Results (Array)
Let's consider each type using the example of the scraper SE::Google, screenshot of the search results:
Simple Results
Simple Results - when one request corresponds to one result, examples:
- The number of results for a query ($totalcount)
- Whether the query is a typo ($misspell, not shown in the screenshot)
Other examples:
- The value of the translated text ($translated) in the scraper DeepL::Translator
- The number of referring domains ($domains), trust value ($trustflow), backlinks ($backlinks), etc. in the scraper Rank::MajesticSEO
Single results are stored in regular variables (prefix $
+ name in Latin script)
Arrays of Results
Arrays of Results - when one request corresponds to a list of results, each item in the list may contain several nested elements. Let's analyze using the example of Google search results - it is represented in the scraper by the array $serp
, for clarity let's use a table, let's write down the first 5 results:
Link ($link) | Anchor ($anchor) | Snippet ($snippet) |
---|---|---|
http://www.speedtest.net/ | Speedtest.net by Ookla - The Global Broadband Speed Test | Test your Internet connection bandwidth to locations around the world with this interactive broadband speed test from Ookla. |
http://en.wikipedia.org/wiki/Test_cricket | Test cricket - Wikipedia, the free encyclopedia | Test cricket is the longest form of the sport of cricket. Test matches are played between national representative teams with "Test status", as determined by the ... |
http://www.speakeasy.net/speedtest/ | Speakeasy Speed Test | Saturday 03-May 2014, 11:04:29 AM Your IP: The Speakeasy Speed Test requires Flash v7 or higher. Please update your browser. See Pricing Or Call Today |
http://www.humanmetrics.com/cgi-win/jtypes2.asp | Personality test based on C. Jung and I. Briggs Myers type theory | Humanmetrics Jung Typology Test™ instrument uses methodology, questionnaire, scoring and software that are proprietary to Humanmetrics, and shall not be ... |
http://test-ipv6.com/ | Test your IPv6. | This will test your browser and connection for IPv6 readiness, as well as show you your current IPV4 and IPv6 address. ... Test your IPv6 connectivity. JavaScript ... |
Each search position is recorded in an array with 3 nested elements - link ($link), anchor ($anchor), snippet ($snippet)
Another example - a list of related keywords, which is stored in the array $related
:
Keyword($key) |
---|
test wwe |
depression test |
test my speed |
wonderlic test |
test personality |
act test |
jiggle test |
bipolar test |
As you can see, this array has only one nested element - keyword ($key)
The numbering of array elements starts from 0, an example of accessing individual array elements:
$serp.0.link
- the first link from the search results$serp.3.anchor
- the fourth anchor from the search results$related.0.key
- the first related keyword
More details about the formatting of simple results and arrays will be described below
Formatting Principles
After the scraper has collected data in simple results and arrays, they need to be displayed (saved to a file) in the desired format. For convenience and functionality, A-Parser uses the templating engine Template Toolkit. Let's look at frequently used constructs, for this we will use the tool Templates Testing. Let's select a project for the scraper SE::Google: