Skip to main content

Helper methods (utils, tools, sleep)

this.utils.*

.updateResultsData(results, data)

await this.utils.updateResultsData(results, data) - a method for automatically filling $pages.$i.data and $data, must be called to add content to the resulting page

.urlFromHTML(url, base)

await this.utils.urlFromHTML(url, base) - processes a link obtained from HTML code - decodes entities (& etc.), optionally base can be passed - the base URL (e.g., the URL of the source page), thus a full link can be obtained

.url.extractDomain(url, removeDefaultSubdomain)

await this.utils.url.extractDomain(url, removeDefaultSubdomain) - the method takes a link as the first parameter and returns the domain from this link. The second optional parameter determines whether to trim the www subdomain from the domain. Default is 0 - meaning do not trim.

.url.extractTopDomain(url)

await this.utils.url.extractTopDomain(url) - the method takes a link as the first parameter and returns the domain from this link, without subdomains.

.url.extractTopDomainByZone(url)

await this.utils.url.extractTopDomainByZone(url) - the method takes a link as the first parameter and returns the domain from this link, including without subdomains. Works with all regional zones

.url.extractMaxPath(url)

await this.utils.url.extractMaxPath(url) - the method takes a string and extracts a URL from it

.url.extractWOParams(url)

await this.utils.url.extractWOParams(url)- the method takes a link and returns the same link trimmed to the parameter string. That is, it will return the URL up to ?

.removeHtml(string)

await this.utils.removeHtml(string) - the method takes a string and returns it cleared from HTML tags

.removeNoDigit(string)

await this.utils.removeNoDigit(string) - the method takes a string, removes everything from it except digits, and returns the result

.removeComma(string)

await this.utils.removeComma(string) - the method takes a string, removes characters such as .,\r\n from it, and returns the result

.getAllBlocks(html, regexp, opts?)

await this.utils.getAllBlocks(html, regexp, opts?) - getting all blocks on a page with corresponding closing tags, the method takes an HTML string and a regular expression that points to the start of the block (any blocks that have paired closing tags, for example <div>...</div>), as a result, an array of all found blocks is returned

Options opts:

  • searchStartIndex - specifies the index in the string from which to start the search, default is 0
const blocks = this.utils.getAllBlocks(html, /<div [^>]*?class="results"/)

.getAllBlocksByAttr(html, tag, attrName, attrRegExp, opts?)

await this.utils.getAllBlocksByAttr(html, tag, attrName, attrRegExp, opts?) - a method similar to .getAllBlocks, instead of a regular expression for finding the start of a block, the tag name, the attribute name to search by (e.g., id, class), and a regular expression to be applied to the value of the specified attribute are provided

const blocks = this.utils.getAllBlocksByAttr(html, 'div', 'class', /results/)

await tools.*

Global object tools, allows access to built-in A-Parser functions

Analog of template engine tools $tools.*

note

tools.query is unavailable, this.query must be used

await tools.createTemplate(string)

Allows using the Template Toolkit template engine inside a JavaScript parser.

let template = await tools.createTemplate("Hello [% content %]!")
template = typeof template == 'function' ? await template({content: 'World'}) : template
this.logger.put(template) // Output: Hello World!

Example of use "Parser for translating Markdown markup to HTML"

await this.sleep(sec)

await this.sleep(sec)

Sets a delay in the thread for a number of seconds (sec), can be fractional.

await this.mutex.*

Mutex for synchronization between threads, allows locking a code section for one thread

.lock()

Waiting for a lock, execution will continue for the first thread that acquired the lock, other threads will wait for the lock to be released

.unlock()

Releasing a lock, the next thread will continue execution if it was waiting for the lock - .lock()

results.<array>.addElement()

The results.<array>.addElement() method allows for more convenient filling of arrays in results. When using it, you don't need to remember the sequence of variables in the array and list them manually.

results.serp.addElement({
link: 'https://google.com',
anchor: 'Google',
snippet: 'Loreps ipsum...',
});

this.isContextAlive()

This method is necessary for long-lived threads that process queries in a loop, allowing for correct termination when a task is stopped or deleted

while (this.isContextAlive()) {
await this.request(...)
}