Threads setup

Sep 14, 2017
  • Work of A-Parser is constructed by the principle of multithreaded data handling. The parser performs tasks in parallel in separate threads, the number of which can be flexibly varied depending on the configuration of the server.
    Let's see what is threads are in practice. Suppose you need to make a report for three months.
    As you can see from these examples, multi-threaded work allows you to perform a task faster, but at the same time requires more resources (we need 3 accountants instead of 1).
    Similarly multithreading works in A-Parser. Let's say you need to parse information from several links:
    1. with one thread, the application will parse each site in turn
    2. when working in multiple threads, each will process its link, after which it will proceed to the next in unhandled list
    Thus, in the second variant, the entire task will be performed much faster, but it requires more server resources, so it is recommended that you follow the System requirements.


    Threads setup(top)

    Threads in A-Parser are configured separately for each task, depending on the parameters that are required to run it. By default, 2 config presets are available: on 20 threads (default) and on 100 threads (100 Threads). In order to get into the settings of the selected config, you need to click on the pencil icon [​IMG], and then its settings will open.
    [​IMG]
    Also it is possible to pass to settings of threads through menu item: Settings - Configs Presets.

    Here we can:
    • create a new config with own settings and to save it with the new name (button Save As New)
    • make changes to the existing config, having selected it from the dropdown list (Save button). Attention, a config by default with the name default - cannot be changed.
    [​IMG]
    Below each setting item will be considered in detail.

    Threads count(top)

    This parameter specifies the number of threads in which task running with this config will work. The number of threads can be any, but you need to consider the capabilities of your server, as well as limiting the tariff of the proxy, if such a restriction is provided. For example, for our proxies, you can specify no more than the selected tariff.

    Proxy Checkers(top)

    This parameter allows you to select a proxy checker with certain settings. Here you can select All, which means using all working proxies, or only those that you want to use in the task (multiselect are available).

    Max threads per proxy(top)

    Here you specify the number of threads that will use the same proxy at the same time. Allows you to specify different parameters, for example, work 1 thread = 1 proxy.

    Global proxy ban(top)

    All tasks running with this option have a common proxy database. The peculiarity of this parameter is that the list of banned proxies for each parser is common for all working tasks.

    Max connections per host(top)

    This parameter specifies the maximum number of connections per host, designed to reduce the load on the site when parsing information from it. In fact, specifying this parameter allows you to control the number of requests at a time, for each specific domain. Enabling this option applies to the task, if you run multiple tasks simultaneously with the same config thread, the limit will be considered for all tasks.

    Reuse proxy between retries(top)

    This setup turns off check on uniqueness of a proxy for each retry, and the ban of a proxy will also not work. It in turn means an opportunity to use 1 proxy for all retries.

    Recommendations(top)

    This article covers all the settings that make it possible to manage threads. It is worth noting that when configuring the flow configurations, it is not necessary to change all the parameters specified in the article, it is enough to specify only those that ensure a correct result. Typically, you only need to change the Threads count, the other settings can be left by default.