Proxy Checkers
This section displays the statistics of all proxy checkers' work. Each proxy checker is a continuously running module (if enabled), which checks proxies and thereby has an up-to-date list of live proxies.
You can add an unlimited number of proxy checkers and select one or several of them for each task or even each scraper in the task. Thus, it is possible to use one set of proxies for parsing Google and a completely different set for Yandex within the same task.
At the top, the total number of live proxies and the number of running (working) proxy checkers are displayed. At the top right, there is a button to add a new proxy checker. More about the procedure for adding proxy checkers is described in the section Proxy Settings.
Below is a list of all existing proxy checkers in the form of cards with information about each proxy checker. The following information is displayed on each card:
- Working directory - folder with files of the proxy checker in
aparser/files/proxy
- Update time - the time of the last check of the uploaded proxy list
- Number of proxies in the check queue and the total number of uploaded proxies
- Number of live proxies
- Download status or date of the next download from proxy sources
- Number of sources from which proxies were last successfully downloaded and the total number of sources in this proxy checker
- The current status of proxy checking
The Enabled
checkbox next to the proxy checker control buttons allows you to enable/disable the proxy checker.
The first in the list of proxy checkers is always the default
proxy checker. It serves as a template for new proxy checkers and cannot be edited or deleted.
File Structure
The working files of the proxy checker are located in the folder files/proxy/<name of the proxy checker>
:
proxy.txt
- proxies are loaded from this file, you need to put the list of proxies heresites.txt
- you need to put the list of proxy sources (links to proxies, one link per line) in this filealive.txt
- live proxies are saved to this file every 5 seconds if the corresponding option is enabledregex.txt
- this file contains a list of regular expressions for parsing proxies from external sources (one regular expression per line, $1 should be the IP address, $2 - the port)
If you have links to proxy sources - specify them in the sites.txt file, the proxy.txt file should be left empty
For the "default" proxy checker, the files are located in the root directory files/proxy/
Adding and Configuring a Proxy Checker
Go to the "Proxy Checker" menu and click "Add Checker" or select "Edit" from the drop-down menu in an existing proxy checker. You will be taken to the proxy checker settings page.
If necessary, set the required number of threads for checking proxies (Checking Threads), select the type of proxy (Proxy Type), and change other settings. The default parameter values are suitable for most tasks. Save the settings as a new proxy checker. It is not possible to change and save the settings of the default proxy checker.
Proxy sources are specified in the files inside the folder with the name of the created proxy checker (files/proxy/.../):
- links in sites.txt
- list of proxies in proxy.txt
Proxies with IP access
Proxies with IP access are configured in a similar way.
List of proxies with the same login and password for all proxies
This method is suitable for cases when the list of proxies is in the ip:port
format and the login/password is the same for the entire list of proxies
In the checker settings, we specify:
- login
- password
- Use proxy authorization
List of proxies with different passwords for each proxy
In this case, the list of proxies should be in the format login:password@ip:port
, in the checker settings it is enough to indicate Use proxy authorization
⏩ Video: connecting a proxy with authorization
Choosing a proxy checker for a task
These settings are necessary to differentiate the operation of tasks with various proxy checkers, you can skip this section if you need to use all available proxies in all tasks
Go to the Settings -> Threads Settings, select the required preset or create a new one (button Add new).
In the Proxy Checkers field, select one or several proxy checkers (to use the proxy checkers they must be enabled) and save (Save). You can also select all proxy checkers at once All (default value).
Now you can use the created Threads Config with the specified proxies in your tasks by selecting it in the Task Editor.
You can also override the proxy checker in each scraper using the override function - Proxy Checker.
The Exclude from "All" option in the proxy checker settings allows you to exclude its proxies from general use in A-Parser. This option is useful in cases where you need to make certain proxies available only from specific tasks or only for specific scrapers:
- for the task, you must forcibly select the excluded proxy checker
- for a specific scraper, it is necessary to set the use of the excluded proxy checker in the settings
Changes in logic
Previously, if a specific proxy checker was selected in the task, and another proxy checker was specified in the scraper, the scraper waited for a proxy. Now the settings of a specific scraper are more prioritized:
- "All" - uses all proxies selected for the task
- a specific proxy checker - uses it, even if it is not selected in the task
Proxy checker parameters
Parameter Name | Default Value | Description |
---|---|---|
Loading type | Replace | Defines whether to keep previously loaded proxies or not, Add - always adds new proxies to the general list, Replace - replaces old proxies with newly loaded ones |
Load threads count | 5 | Number of threads for loading proxies from websites |
Load interval | 30 | Interval between full rechecks of the proxy list |
Load timeout | 30 | Timeout for a request to the proxy site |
Load max size | 524288 | Maximum page size with proxies, if the page is larger it is trimmed to the specified size |
Load limit count | 0 | Limit on the number of proxies to be loaded, 0 to disable |
No check proxies | ☐ | Allows disabling proxy checking. All loaded proxies are automatically considered alive |
Proxies type | HTTP, SOCKS5 | Choice of which types of proxies to check and in what sequence, if both HTTP and SOCKS are specified simultaneously, the proxy will be rechecked for the SOCKS protocol if the HTTP check fails |
Check threads | 15 | Number of threads for checking proxies |
Check url | http://work.a-poster.info:25000/ | Link to the proxy checking script, currently the check is carried out through the scraper server, this behavior may change in the future |
Check interval | 30 | Interval between full rechecks of all proxies |
Check timeout | 5 | Proxy timeout |
Check max size | 5120 | Maximum download page size when checking proxies |
Check anonymous | ☐ | Check proxies for anonymity, if selected then it is mandatory to specify External IP |
External IP | External IP address of the computer/server, must be specified if the Check anonymous option is enabled | |
Exclude from "All" | ☐ | By default, in each scraper, the value "All" is selected as the proxy checker, i.e., all available proxy checkers are used. If the option is enabled, the proxy checker will be excluded from All. |
Save alive proxies to file | No | Save alive proxies to the file files/proxy/alive.txt |
Use proxy authorization | ☐ | Use authorization for proxies by login/password |
Authorization login | Login for authorization | |
Authorization password | Password for authorization |
Installing the hosting verification script
By default, A-Parser checks proxies through its own verification script, without the need to install the script on your own hosting
Upload the following PHP script to your hosting or server and specify the link to it in Check url:
<?php
print_r($_SERVER);
print_r($_POST);
?>
And specify one of the proxy lists:
- **[http://work.a-poster.info/prx/perm_socks.txt](http://work.a-poster.info/prx/perm_socks.txt)** - Each port has its own proxy with its own outgoing IP address. The proxy is fixed to its port as long as it is online. This list is updated every 30 seconds and always contains current and live proxies.
- **[http://work.a-poster.info/prx/rand_socks.txt](http://work.a-poster.info/prx/rand_socks.txt)** - The outgoing IP address changes for each connection to the proxy. The IP address is chosen randomly from all live proxies. This list is fixed and there is no need to update it.