JavaScript Scrapers: Overview of Capabilities
JavaScript scrapers - is the ability to create your own full-fledged scrapers with logic of any complexity using the JavaScript language. Furthermore, JS scrapers can also utilize all the functionality of standard scrapers.

Features
Using the full power of A-Parser, you can now write your own scraper/register/poster with logic of any complexity. JavaScript with ES6 capabilities (v8 engine) is used for writing code.
The scraper code is highly concise, allowing you to focus on writing the logic; A-Parser handles multithreading, networking, proxies, results, logs, etc. The code can be written directly in the scraper interface by adding a new scraper in the Scraper Editor. You can also use third-party editors, such as VSCode, for writing scrapers.
Automatic versioning is used when saving scraper code via the built-in editor.
Working with JavaScript scrapers is available for Pro and Enterprise licenses
Access to the JS Scraper Editor
If A-Parser is used remotely, the JS Scraper Editor is not available by default for security reasons. To grant access to it, you must:
- Set a password on the Settings tab -> General Settings
- Add the following line to config/config.txt:
allow_javascript_editor: 1 - Restart A-Parser
Manual
In the Scraper Editor, create a new scraper and specify the scraper name. A simple example will be loaded by default, based on which you can quickly start creating your own scraper.
If a third-party editor is used for writing the code, you need to open the editable scraper file in the /parsers/. File structure of the installed program.
Once the code is ready, save it and use it like a regular scraper: select the created scraper in the Task Editor, and if necessary, you can set the required parameters, thread configuration, file name, etc.
The created scraper can be edited at any time. All changes related to the interface will appear after reselecting the scraper in the scraper list or restarting A-Parser; changes in the scraper's logic are applied when the task is run again with the scraper.
A standard icon is displayed by default for each created scraper; you can add your own in png or ico, format by placing it in the scraper folder in /parsers/:
General principles of operation
By default, a simple scraper example is created, ready for further editing.
- TypeScript
- JavaScript
import { BaseParser } from 'a-parser-types';
export class JS_v2_example extends BaseParser {
static defaultConf: typeof BaseParser.defaultConf = {
version: '0.0.1',
results: {
flat: [
['title', 'HTML title'],
]
},
max_size: 2 * 1024 * 1024,
parsecodes: {
200: 1,
},
results_format: '$query: $title\\n',
};
static editableConf: typeof BaseParser.editableConf = [];
async parse(set, results) {
this.logger.put("Start scraping query: " + set.query);
let response = await this.request('GET', set.query, {}, {
check_content: ['<\/html>'],
decode: 'auto-html',
});
if (response.success) {
let matches = response.data.match(/<title>(.*?)<\/title>/i);
if (matches)
results.title = matches[1];
}
results.success = response.success;
return results;
}
}
const { BaseParser } = require("a-parser-types");
class JS_v2_example_js extends BaseParser {
static defaultConf = {
version: '0.0.1',
results: {
flat: [
['title', 'HTML title'],
]
},
max_size: 2 * 1024 * 1024,
parsecodes: {
200: 1,
},
results_format: '$query: $title\\n',
};
static editableConf = [];
async parse(set, results) {
this.logger.put("Start scraping query: " + set.query);
let response = await this.request('GET', set.query, {}, {
check_content: ['<\/html>'],
decode: 'auto-html',
});
if (response.success) {
let matches = response.data.match(/<title>(.*?)<\/title>/i);
if (matches)
results.title = matches[1];
}
results.success = response.success;
return results;
}
}
The constructor is called once for each task. You must define this.defaultConf.results and this.defaultConf.results_format, the remaining fields are optional and will take default values.
The this.editableConf array defines which settings can be modified by the user from the A-Parser interface. The following field types can be used:
combobox- a dropdown selection menu. You can also create a selection menu for a standard scraper preset, for example:
['Util_AntiGate_preset', ['combobox', 'AntiGate preset']]
comboboxwith multi-select capability. You must additionally specify the parameter{'multiSelect': 1}:
['proxyCheckers', ['combobox', 'Proxy Checkers', {'multiSelect': 1}, ['*', 'All']]]
checkbox- checkbox, for parameters that can only have 2 values (true/false)textfield- text fieldtextarea- text field with multi-line input
The method parse is an asynchronous function, and must return await (for any blocking operation (this is the main and only difference from a regular function). The method is called for each request received for processing. set (hash with the request and its parameters) and results (empty template for results) are obligatorily passed. It is also mandatory to return the filled in results, after setting the flag success.
Automatic versioning
The version has the format Major.Minor.Revision
- TypeScript
- JavaScript
this.defaultConf: typeof BaseParser.defaultConf = {
version: '0.1.1',
...
}
this.defaultConf = {
version: '0.1.1',
...
}
The value Revision (the last digit) automatically increases with each save. The other values (Major, Minor) can be changed manually, and Revision can be reset to 0.
If for some reason you need to change Revision only manually, the version must be enclosed in double quotes ""
Bulk Query Processing
In some cases, it may be necessary to take several queries from the queue at once and process them in a single operation. For example, this functionality is used in
SE::Yandex::Direct::Frequency: when scraping with accounts, data is collected in batches of 10 queries.
To implement the same functionality in the JS scraper, you need to set the value this.defaultConf in bulkQueries: N, where N - is the required number of queries in the batch. In this case, the scraper will take queries in batches of N and all queries of the current iteration will be contained in the array set.bulkQueries (including all standard variables: query.first, query.orig, query.prev etc.). The example of such an array is below:
[
{
"first": "test",
"prev": "",
"lvl": 0,
"num": 0,
"query": "test",
"queryUid": "6eb301",
"orig": "test"
},
{
"first": "check",
"prev": "",
"lvl": 0,
"num": 1,
"query": "check",
"queryUid": "774563",
"orig": "check"
},
{
"first": "third query",
"prev": "",
"lvl": 0,
"num": 2,
"query": "third query",
"queryUid": "2bc8ed",
"orig": "third query"
}
]
Results during batch processing must be filled in the array results.bulkResults, where each element is an object results. The elements in results.bulkResults are arranged in the same order as in set.bulkQueries.
Useful Links
📄️ bulkQueries Example
Example of using bulkQueries with an internal scraper call
🔗 Examples and Discussion
Forum thread with examples and discussion of JS scraper functionality
🔗 JS Scraper Catalog
Section in the resource catalog dedicated to JS scrapers
🔗 Overview of basic ES6 features
An article on habrahabr dedicated to the overview of basic ES6 features