Skip to main content

A-Parser Integration with Redis: Advanced API

Comparison with HTTP API

A-Parser Redis API was developed to replace the oneRequest and bulkRequest methods for a more high-performance implementation and to support additional use cases:

  • Redis acts as the request and result server
  • ability to request results asynchronously or in blocking mode
  • ability to connect multiple parsers (both on the same or different servers) to process requests with a single entry point
  • ability to set the number of threads for processing requests and view operation logs
  • ability to organize timeouts for operations
  • automatic Expire of unclaimed results

Forum discussion thread

Pre-setup

Unlike A-Parser HTTP API, to use the Redis API, you must first configure and run a task with the API::Server::RedisAPI::Server::Redis parser:

  • install and run Redis server (locally or remotely)
  • create a settings preset for the API::Server::RedisAPI::Server::Redis parser, specifying:
    • Redis Host - Redis server address, default is 127.0.0.1
    • Redis Port - Redis server port, default is 6379
    • Redis Queue Key - key name for data exchange with A-Parser, default is aparser_redis_api, you can create separate queues and process them with different tasks or different copies of A-Parser
    • Result Expire(TTL) - result lifetime in seconds, used for automatic control and deletion of unclaimed results, default is 3600 seconds (1 hour)
  • add a task with the API::Server::RedisAPI::Server::Redis parser
    • for queries, you must specify {num:1:N}, where N must correspond to the number of threads specified in the task
    • you can also enable the logging option, so the ability to view the log for each request will be available

Example of task configuration with API::Server::RedisAPI::Server::Redis

Getting API request

Running A-Parser together with Redis using docker-compose

With this launch method, you can specify the service name instead of an IP as the Redis server address (Redis Host); in the examples below, it is redis

If A-Parser has not been run via docker-compose before

  1. Download and unpack the distribution (a one-time link must first be obtained from the Members Area, as described here):
curl -O https://a-parser.com/members/onetime/ce42f308eaa577b5/aparser.tar.gz
tar zxf aparser.tar.gz
rm -f aparser.tar.gz
  1. Create a docker-compose.yml file and put the following content into it:

    • Basic version without password and port opening, Redis will only be available inside the Docker network
    version: '3'

    services:
    a-parser:
    image: aparser/runtime:latest
    command: ./aparser
    restart: always
    volumes:
    - ./aparser:/app
    ports:
    - 9091:9091

    redis:
    image: redis:latest
    restart: always
    • Version with password and port opening, Redis will be accessible from outside, so using a password is highly recommended
    version: '3'

    services:
    a-parser:
    image: aparser/runtime:latest
    command: ./aparser
    restart: always
    volumes:
    - ./aparser:/app
    ports:
    - 9091:9091

    redis:
    image: redis:latest
    restart: always
    command: redis-server --requirepass YOUR_REDIS_PASSWORD_HERE
    ports:
    - 6379:6379

    Instead of YOUR_REDIS_PASSWORD_HERE, create and specify the password that will be used for authorization in Redis.

  2. Start the containers:

docker compose up -d

If A-Parser has already been run via docker-compose before

  1. Edit the docker-compose.yml file by adding the following content to the end:

    • Basic version without password and port opening, Redis will only be available inside the Docker network
      redis:
    image: redis:latest
    restart: always
    • Version with password and port opening, Redis will be accessible from outside, so using a password is highly recommended
      redis:
    image: redis:latest
    restart: always
    command: redis-server --requirepass YOUR_REDIS_PASSWORD_HERE
    ports:
    - 6379:6379

    Instead of YOUR_REDIS_PASSWORD_HERE, create and specify the password that will be used for authorization in Redis.

  2. Start the containers:

docker compose up -d
note

If A-Parser was already running and its configuration has not changed, it will not be restarted, and Docker will simply add and start Redis.

Executing requests

Redis API operation is based on Redis Lists; operations on lists allow adding an unlimited number of requests to the queue (limited by RAM), as well as receiving results in blocking mode with a timeout (blpop) or in asynchronous mode (lpop).

  • all settings except useproxy, proxyChecker, and proxybannedcleanup are taken from the preset of the called parser + overrideOpts
  • the useproxy, proxyChecker, and proxybannedcleanup settings are taken from the API::Server::RedisAPI::Server::Redis preset + overrideOpts

A request is added to Redis using the lpush command; each request consists of an array [queryId, parser, preset, query, overrideOpts, apiOpts] serialized using JSON:

  • parser, preset, query correspond to the same parameters for the oneRequest API request
  • queryId - formed along with the request; we recommend using a sequence number from your database or a good random value; the result can be retrieved using this ID
  • overrideOpts - overriding settings for the parser preset
  • apiOpts - additional API processing parameters
note

When requesting via Redis, the result formatting stage is skipped, as the entire result is passed as JSON for further programmatic processing.

redis-cli

Example of executing requests; you can use redis-cli for testing:

127.0.0.1:6379> lpush aparser_redis_api '["some_unique_id", "Net::HTTP", "default", "https://ya.ru"]'
(integer) 1
127.0.0.1:6379> blpop aparser_redis_api:some_unique_id 0
1) "aparser_redis_api:some_unique_id"
2) "{\"data\":\"<!DOCTYPE html><html.....

Use cases

Asynchronous check for the result

lpop aparser_redis_api:some_unique_id

Will return the result if it has already been processed or nil if the request is still being processed

Blocking result retrieval

blpop aparser_redis_api:some_unique_id 0

This request will be blocked until the result is received; you can also specify a maximum timeout for receiving the result, after which the command will return nil

Saving results to a single queue

By default, A-Parser saves the result for each request under its own unique key aparser_redis_api:query_id, which allows organizing multi-threaded processing by sending requests and receiving results separately for each thread

In some cases, it is necessary to process results in a single thread as they arrive; in this case, it is more convenient to save results to a single results queue (the key must differ from the key for requests)

To do this, you must specify the output_queue key for apiOpts:

lpush aparser_redis_api '["some_unique_id", "Net::HTTP", "default", "https://ya.ru", {}, {"output_queue": "aparser_results"}]'

Retrieving a result from the common queue:

127.0.0.1:6379> blpop aparser_results 0
1) "aparser_results"
2) "{\"queryId\":\"some_unique_id\",\"results\":{\"data\":\"<!DOCTYPE html><html class=...

Example implementation (SpySERP case)

Suppose we are creating a SaaS service that evaluates domain parameters; for simplicity, we will check the domain registration date

Our service consists of 2 pages:

  • /index.php - landing page where the domain entry form is located
  • /results.php?domain=google.com - page with the service results

To improve user experience, we want our service pages to load instantly, while the data waiting process looks natural and displays a loader

When requesting results.php, we first perform a request to the A-Parser Redis API, forming a unique request_id:

​lpush aparser_redis_api '["request-1", "Net::Whois", "default", "google.com", {}, {}]'

After that, we can display the page to the user and show a loader in the data display area; due to the absence of delays, the server response will be limited only by the Redis connection speed (usually within 10ms)

A-Parser will start processing the request even before the user's browser receives the first content; after the browser loads all necessary resources and scripts, we can display the result; to do this, we send an AJAX request to retrieve the data:

/get-results.php?request_id=request-1

The get-results.php script performs a blocking request to Redis with a 15-second timeout:

blpop aparser_redis_api:request-1 15

And returns the response as soon as it is received from A-Parser; if we received a null result by timeout, we can display a data retrieval error to the user

Thus, by sending a request to A-Parser when the page is first opened (/results.php), we reduce the required data waiting time for the user (/get-results.php) by the time the user's browser spends waiting for content, loading scripts, and executing the AJAX request