Helper methods (utils, tools, sleep)
this.utils.*
.updateResultsData(results, data)
await this.utils.updateResultsData(results, data) - a method for automatically filling $pages.$i.data and $data, must be called to add content to the resulting page
.urlFromHTML(url, base)
await this.utils.urlFromHTML(url, base) - processes a link obtained from HTML code - decodes entities (& etc.), optionally base can be passed - the base URL (e.g., the URL of the source page), thus a full link can be obtained
.url.extractDomain(url, removeDefaultSubdomain)
await this.utils.url.extractDomain(url, removeDefaultSubdomain) - the method takes a link as the first parameter and returns the domain from this link. The second optional parameter determines whether to trim the www subdomain from the domain. Default is 0 - meaning do not trim.
.url.extractTopDomain(url)
await this.utils.url.extractTopDomain(url) - the method takes a link as the first parameter and returns the domain from this link, without subdomains.
.url.extractTopDomainByZone(url)
await this.utils.url.extractTopDomainByZone(url) - the method takes a link as the first parameter and returns the domain from this link, including without subdomains. Works with all regional zones
.url.extractMaxPath(url)
await this.utils.url.extractMaxPath(url) - the method takes a string and extracts a URL from it
.url.extractWOParams(url)
await this.utils.url.extractWOParams(url)- the method takes a link and returns the same link trimmed to the parameter string. That is, it will return the URL up to ?
.removeHtml(string)
await this.utils.removeHtml(string) - the method takes a string and returns it cleared
from HTML tags
.removeNoDigit(string)
await this.utils.removeNoDigit(string) - the method takes a string, removes everything from it except digits, and returns the result
.removeComma(string)
await this.utils.removeComma(string) - the method takes a string, removes characters such as .,\r\n from it, and returns the result
.getAllBlocks(html, regexp, opts?)
await this.utils.getAllBlocks(html, regexp, opts?) - getting all blocks on a page with corresponding closing tags, the method takes an HTML string and a regular expression that points to the start of the block (any blocks that have paired closing tags, for example <div>...</div>), as a result, an array of all found blocks is returned
Options opts:
searchStartIndex- specifies the index in the string from which to start the search, default is0
const blocks = this.utils.getAllBlocks(html, /<div [^>]*?class="results"/)
.getAllBlocksByAttr(html, tag, attrName, attrRegExp, opts?)
await this.utils.getAllBlocksByAttr(html, tag, attrName, attrRegExp, opts?) - a method similar to .getAllBlocks, instead of a regular expression for finding the start of a block, the tag name, the attribute name to search by (e.g., id, class), and a regular expression to be applied to the value of the specified attribute are provided
const blocks = this.utils.getAllBlocksByAttr(html, 'div', 'class', /results/)
await tools.*
Global object tools, allows access to built-in A-Parser functions
Analog of template engine tools $tools.*
tools.query is unavailable, this.query must be used
await tools.createTemplate(string)
Allows using the Template Toolkit template engine inside a JavaScript parser.
let template = await tools.createTemplate("Hello [% content %]!")
template = typeof template == 'function' ? await template({content: 'World'}) : template
this.logger.put(template) // Output: Hello World!
Example of use "Parser for translating Markdown markup to HTML"
await this.sleep(sec)
await this.sleep(sec)
Sets a delay in the thread for a number of seconds (sec), can be fractional.
await this.mutex.*
Mutex for synchronization between threads, allows locking a code section for one thread
.lock()
Waiting for a lock, execution will continue for the first thread that acquired the lock, other threads will wait for the lock to be released
.unlock()
Releasing a lock, the next thread will continue execution if it was waiting for the lock - .lock()
results.<array>.addElement()
The results.<array>.addElement() method allows for more convenient filling of arrays in results. When using it, you don't need to remember the sequence of variables in the array and list them manually.
results.serp.addElement({
link: 'https://google.com',
anchor: 'Google',
snippet: 'Loreps ipsum...',
});
this.isContextAlive()
This method is necessary for long-lived threads that process queries in a loop, allowing for correct termination when a task is stopped or deleted
while (this.isContextAlive()) {
await this.request(...)
}