Skip to main content

Proxy setup

The high-quality work of A-Parser in most cases relies on the use of proxies, so A-Parser has first-class support for proxies of various types and configurations, as well as simultaneous work with multiple different proxy sources, both in one task and separating by type between different tasks.

The main features of A-Parser for working with proxies:

  • Simultaneous support for HTTP, SOCKS4 and SOCKS5 proxies
  • Multithreaded proxy checking
  • Loading proxies from a local file
  • Multithreaded loading from external sources
  • Checking for anonymity
  • Support for login/password authentication for both HTTP and SOCKS5 proxies, as well as support for various authentication data in the format login:[email protected]:port
  • Ability to specify arbitrary regular expressions for the IP address and port of the proxy when parsing from external sources
  • Ability to export verified proxies to a file
  • Ability to use multiple proxy sources in one task
  • Support for domain proxies in formats domain:port and login:[email protected]:port

File structure

The working files of the proxy checker are located in the files/proxy/<proxy checker name> folder:

  • proxy.txt - proxies are loaded from this file, you need to put a list of proxies here
  • sites.txt - you need to put a list of proxy sources (links to proxies, one link per line) in this file
  • alive.txt - live proxies are saved to this file every 5 seconds if the corresponding option is enabled
  • regex.txt - this file contains a list of regular expressions for parsing proxies from external sources (one regular expression per line, $1 should be the IP address, $2 - the port)
info

If you have links to proxy sources, specify them in the sites.txt file, and leave the proxy.txt file empty.

note

For the "default" proxy checker, the files are located in the root directory files/proxy/.

Management

Proxy checkers are managed in the Proxy Checker tab, where you can add, delete, and enable/disable proxy checkers. This tab also displays statistics on the operation of each proxy checker, a graph of live proxies, and statistics on the processing of sources:

img

Adding and configuring a proxy checker

Go to the "Proxy Checker" menu and click "Add Checker" or select "Edit" from the drop-down menu in an existing proxy checker. You will be taken to the proxy checker settings page.

img

If necessary, set the required number of threads for checking proxies (Check threads), select the proxy type (Proxy type), and change other settings. The default parameter values are suitable for most tasks. Save the settings as a new proxy checker. You cannot change and save the settings of the default proxy checker.

Proxy sources are specified in files inside the folder with the name of the created proxy checker (files/proxy/.../):

  • links in sites.txt
  • a list of proxies in proxy.txt

img

Setting up proxies from the Members Area

To use proxies from A-Parser, go to the Members Area, to the Proxy tab, click the Use IP button, and click Save.

img

When using proxies from A-Parser, it is sufficient to disable proxy checking (Check proxies - check the box), other settings can be left at their default values. Save the preset (click Save for existing ones, Add new for new ones).

img

Go back to the "Proxy Checker" menu, check if the checker you just created is enabled, if not, enable it.

img

Open the folder of the proxy checker specified in the "Working directory" field.

img

Copy 1 link to the list of proxies from the Members Area and specify it in the sites.txt file of the created proxy checker.

  • http://work.a-poster.info/prx/perm_socks.txt - Each port has its own proxy with its own output IP address. The proxy is fixed to its port while it is online. This list is updated every 30 seconds and always contains current and live proxies. Recommended for most tasks.
  • http://work.a-poster.info/prx/rand_socks.txt - The output IP address changes for each connection to the proxy. The IP address is randomly selected from all live proxies. This list is fixed and does not need to be updated.

img

We return to A-Parser, in the "Proxy checker" menu. The "Total alive" field of this proxy checker should be greater than 0, which means that the proxies are set up correctly.

⏩ Video: setting up, adding proxies, running a task

Using proxies with authorization

List of proxies with the same login and password for all proxies

This method is suitable for cases when the proxy list has the format ip:port and the login/password is the same for the entire proxy list.

In the checker settings, specify:

  • login
  • password
  • Use proxy authorization

img

List of proxies with different passwords for each proxy

In this case, the proxy list should have the format login:[email protected]:port, in the checker settings it is enough to specify Use proxy authorization.

img

⏩ Video: connecting proxies with authorization

Choosing a proxy checker for a task

info

These settings are necessary to differentiate the work of tasks with different proxy checkers, you can skip this section if you need to use all available proxies in all tasks.

Go to the Settings -> Stream settings menu, select the desired preset or create a new one (the Add new button).

In the Proxy checkers field, select one or more proxy checkers (proxy checkers must be enabled to use them) and save (Save). You can also select all proxy checkers at once All (default value).

img

Now you can use the created Stream Config, with the specified proxies in your tasks, by selecting it in the Task Editor.

img

You can also override the proxy checker in each scraper using the override function - Proxy Checker.

img

The Exclude from "All" option in the proxy checker settings allows you to exclude its proxies from the general access in A-Parser. This option is useful in cases where it is necessary to make certain proxies available only from specific tasks or only for specific scrapers:

  • for the task, it is necessary to forcibly select the excluded proxy checker
  • for a specific scraper, it is necessary to set the use of the excluded proxy checker in the settings

Changes in logic

Previously, if a specific proxy checker was selected in the task, and a different proxy checker was specified in the scraper, the scraper expected a proxy. Now the settings of a specific scraper are more priority:

  • "All" - uses all proxies selected for the task
  • specific proxy checker - uses it, even if it is not selected in the task

Proxy checker parameters

Parameter nameDefault valueDescription
Loading typeReplaceDetermines whether to keep previously loaded proxies or not. Add always adds new proxies to the general list, Replace replaces old proxies with newly loaded ones.
Load threads count5Number of threads for loading proxies from websites.
Load interval30Interval between full re-checks of the list of websites.
Load timeout30Timeout for a request to a website with proxies.
Load max size524288Maximum size of a page with proxies. If the page is larger, it is truncated to the specified size.
Load limit count0Limit on the number of loaded proxies, 0 to disable.
No check proxiesAllows you to disable proxy checking. All loaded proxies are automatically considered alive.
Proxies typeHTTP, SOCKS5Select which types of proxies to check and in what order. If both HTTP and SOCKS are specified at the same time, if the HTTP check fails, the proxy will be re-checked for the SOCKS protocol.
Check threads15Number of threads for checking proxies.
Check urlhttp://work.a-poster.info:25000/Link to the script for checking proxies. Currently, the check is carried out through the parser's server, in the future this behavior may change.
Check interval30Interval between full re-checks of all proxies.
Check timeout5Proxy timeout.
Check max size5120Maximum size of the downloaded page when checking proxies.
Check anonymousCheck proxies for anonymity. If selected, External IP must be specified.
External IP-External IP address of the computer/server. Must be specified if the Check anonymous option is enabled.
Exclude from "All"By default, each parser has "All" selected as the proxy checker, i.e. all available proxy checkers are used. If this option is enabled, the proxy checker will be excluded from All.
Save alive proxies to fileNoSave alive proxies to the file files/proxy/alive.txt.
Use proxy authorizationUse authorization for proxies by login/password.
Authorization login-Login for authorization.
Authorization password-Password for authorization.

Installing the verification script on hosting

info

By default, A-Parser checks proxies through its own verification script, without the need to install the script on your hosting

Upload the following PHP script to your hosting or server and specify the link to it in Check url:

<?php

print_r($_SERVER);
print_r($_POST);

?>

And specify one of the proxy lists:

- **[http://work.a-poster.info/prx/perm_socks.txt](http://work.a-poster.info/prx/perm_socks.txt)** - Each port has its own proxy with its own output IP address. The proxy is fixed to its port as long as it is online. This list is updated every 30 seconds and always contains up-to-date and live proxies.
- **[http://work.a-poster.info/prx/rand_socks.txt](http://work.a-poster.info/prx/rand_socks.txt)** - The output IP address changes for each connection to the proxy. The IP address is randomly selected from all live proxies. This list is fixed and does not need to be updated.