Proxy setup
The high-quality work of A-Parser in most cases relies on the use of proxies, so A-Parser has first-class support for proxies of various types and configurations, as well as simultaneous work with multiple different proxy sources, both in one task and separating by type between different tasks.
The main features of A-Parser for working with proxies:
- Simultaneous support for HTTP, SOCKS4 and SOCKS5 proxies
- Multithreaded proxy checking
- Loading proxies from a local file
- Multithreaded loading from external sources
- Checking for anonymity
- Support for login/password authentication for both HTTP and SOCKS5 proxies, as well as support for various authentication data in the format
login:[email protected]:port
- Ability to specify arbitrary regular expressions for the IP address and port of the proxy when parsing from external sources
- Ability to export verified proxies to a file
- Ability to use multiple proxy sources in one task
- Support for domain proxies in formats
domain:port
andlogin:[email protected]:port
File structure
The working files of the proxy checker are located in the files/proxy/<proxy checker name>
folder:
proxy.txt
- proxies are loaded from this file, you need to put a list of proxies heresites.txt
- you need to put a list of proxy sources (links to proxies, one link per line) in this filealive.txt
- live proxies are saved to this file every 5 seconds if the corresponding option is enabledregex.txt
- this file contains a list of regular expressions for parsing proxies from external sources (one regular expression per line, $1 should be the IP address, $2 - the port)
If you have links to proxy sources, specify them in the sites.txt file, and leave the proxy.txt file empty.
For the "default" proxy checker, the files are located in the root directory files/proxy/
.
Management
Proxy checkers are managed in the Proxy Checker tab, where you can add, delete, and enable/disable proxy checkers. This tab also displays statistics on the operation of each proxy checker, a graph of live proxies, and statistics on the processing of sources:
Adding and configuring a proxy checker
Go to the "Proxy Checker" menu and click "Add Checker" or select "Edit" from the drop-down menu in an existing proxy checker. You will be taken to the proxy checker settings page.
If necessary, set the required number of threads for checking proxies (Check threads), select the proxy type (Proxy type), and change other settings. The default parameter values are suitable for most tasks. Save the settings as a new proxy checker. You cannot change and save the settings of the default proxy checker.
Proxy sources are specified in files inside the folder with the name of the created proxy checker (files/proxy/.../):
- links in sites.txt
- a list of proxies in proxy.txt
Setting up proxies from the Members Area
To use proxies from A-Parser, go to the Members Area, to the Proxy tab, click the Use IP button, and click Save.
When using proxies from A-Parser, it is sufficient to disable proxy checking (Check proxies - check the box), other settings can be left at their default values. Save the preset (click Save for existing ones, Add new for new ones).
Go back to the "Proxy Checker" menu, check if the checker you just created is enabled, if not, enable it.
Open the folder of the proxy checker specified in the "Working directory" field.
Copy 1 link to the list of proxies from the Members Area and specify it in the sites.txt file of the created proxy checker.
- http://work.a-poster.info/prx/perm_socks.txt - Each port has its own proxy with its own output IP address. The proxy is fixed to its port while it is online. This list is updated every 30 seconds and always contains current and live proxies. Recommended for most tasks.
- http://work.a-poster.info/prx/rand_socks.txt - The output IP address changes for each connection to the proxy. The IP address is randomly selected from all live proxies. This list is fixed and does not need to be updated.
We return to A-Parser, in the "Proxy checker" menu. The "Total alive" field of this proxy checker should be greater than 0, which means that the proxies are set up correctly.
⏩ Video: setting up, adding proxies, running a task
Using proxies with authorization
List of proxies with the same login and password for all proxies
This method is suitable for cases when the proxy list has the format ip:port
and the login/password is the same for the entire proxy list.
In the checker settings, specify:
- login
- password
- Use proxy authorization
List of proxies with different passwords for each proxy
In this case, the proxy list should have the format login:[email protected]:port
, in the checker settings it is enough to specify Use proxy authorization
.
⏩ Video: connecting proxies with authorization
Choosing a proxy checker for a task
These settings are necessary to differentiate the work of tasks with different proxy checkers, you can skip this section if you need to use all available proxies in all tasks.
Go to the Settings -> Stream settings menu, select the desired preset or create a new one (the Add new button).
In the Proxy checkers field, select one or more proxy checkers (proxy checkers must be enabled to use them) and save (Save). You can also select all proxy checkers at once All (default value).
Now you can use the created Stream Config, with the specified proxies in your tasks, by selecting it in the Task Editor.
You can also override the proxy checker in each scraper using the override function - Proxy Checker.
The Exclude from "All" option in the proxy checker settings allows you to exclude its proxies from the general access in A-Parser. This option is useful in cases where it is necessary to make certain proxies available only from specific tasks or only for specific scrapers:
- for the task, it is necessary to forcibly select the excluded proxy checker
- for a specific scraper, it is necessary to set the use of the excluded proxy checker in the settings
Changes in logic
Previously, if a specific proxy checker was selected in the task, and a different proxy checker was specified in the scraper, the scraper expected a proxy. Now the settings of a specific scraper are more priority:
- "All" - uses all proxies selected for the task
- specific proxy checker - uses it, even if it is not selected in the task
Proxy checker parameters
Parameter name | Default value | Description |
---|---|---|
Loading type | Replace | Determines whether to keep previously loaded proxies or not. Add always adds new proxies to the general list, Replace replaces old proxies with newly loaded ones. |
Load threads count | 5 | Number of threads for loading proxies from websites. |
Load interval | 30 | Interval between full re-checks of the list of websites. |
Load timeout | 30 | Timeout for a request to a website with proxies. |
Load max size | 524288 | Maximum size of a page with proxies. If the page is larger, it is truncated to the specified size. |
Load limit count | 0 | Limit on the number of loaded proxies, 0 to disable. |
No check proxies | ☐ | Allows you to disable proxy checking. All loaded proxies are automatically considered alive. |
Proxies type | HTTP, SOCKS5 | Select which types of proxies to check and in what order. If both HTTP and SOCKS are specified at the same time, if the HTTP check fails, the proxy will be re-checked for the SOCKS protocol. |
Check threads | 15 | Number of threads for checking proxies. |
Check url | http://work.a-poster.info:25000/ | Link to the script for checking proxies. Currently, the check is carried out through the parser's server, in the future this behavior may change. |
Check interval | 30 | Interval between full re-checks of all proxies. |
Check timeout | 5 | Proxy timeout. |
Check max size | 5120 | Maximum size of the downloaded page when checking proxies. |
Check anonymous | ☐ | Check proxies for anonymity. If selected, External IP must be specified. |
External IP | - | External IP address of the computer/server. Must be specified if the Check anonymous option is enabled. |
Exclude from "All" | ☐ | By default, each parser has "All" selected as the proxy checker, i.e. all available proxy checkers are used. If this option is enabled, the proxy checker will be excluded from All. |
Save alive proxies to file | No | Save alive proxies to the file files/proxy/alive.txt. |
Use proxy authorization | ☐ | Use authorization for proxies by login/password. |
Authorization login | - | Login for authorization. |
Authorization password | - | Password for authorization. |
Installing the verification script on hosting
By default, A-Parser checks proxies through its own verification script, without the need to install the script on your hosting
Upload the following PHP script to your hosting or server and specify the link to it in Check url:
<?php
print_r($_SERVER);
print_r($_POST);
?>
And specify one of the proxy lists:
- **[http://work.a-poster.info/prx/perm_socks.txt](http://work.a-poster.info/prx/perm_socks.txt)** - Each port has its own proxy with its own output IP address. The proxy is fixed to its port as long as it is online. This list is updated every 30 seconds and always contains up-to-date and live proxies.
- **[http://work.a-poster.info/prx/rand_socks.txt](http://work.a-poster.info/prx/rand_socks.txt)** - The output IP address changes for each connection to the proxy. The IP address is randomly selected from all live proxies. This list is fixed and does not need to be updated.