Skip to main content

A-Parser Integration with Redis: Advanced API

Comparison with HTTP API

A-Parser Redis API was developed to replace the oneRequest and bulkRequest methods for a more performant implementation and to support additional use cases:

  • Redis acts as the request and results server
  • the ability to request results asynchronously or in a blocking mode
  • the ability to connect multiple scrapers (both on the same or different servers) to process requests from a single entry point
  • the ability to set the number of threads for request processing and view operation logs
  • the ability to organize timeouts for operations
  • automatic Expire for unclaimed results

Forum discussion thread

Preliminary Setup

UnlikeA-Parser HTTP API to use the Redis API, you must first configure and run the job with the scraperAPI::Server::RedisAPI::Server::Redis:

  • install and run Redis server (locally or remotely)
  • create a preset of settings for the scraperAPI::Server::RedisAPI::Server::Redis, specify:
    • Redis Host - Redis server address, default127.0.0.1
    • Redis Port - Redis server port, default6379
    • Redis Queue Key - key name for data exchange with A-Parser, defaultaparser_redis_api, you can create separate queues and process them with different jobs or different A-Parser copies
    • Result Expire(TTL) - result lifetime in seconds, used for automatic control and removal of unclaimed results, default 3600 seconds (1 hour)
  • add a job with the scraperAPI::Server::RedisAPI::Server::Redis
    • for requests, you must specify{num:1:N}, whereN must match the number of threads specified in the job
    • you can also enable the logging option, thus the ability to view the log for each request will be available

Example of setting up a job withAPI::Server::RedisAPI::Server::Redis

Getting API request

Running A-Parser together with Redis using docker-compose

With this launch method, for the Redis server address (Redis Host) instead of IP, you can specify the service name, in the examples below it isredis

If A-Parser has not been run previously via docker-compose

  1. Download and unpack the distribution (you first need to get the one-time link in the Personal Account, as describedhere):
curl -O https://a-parser.com/members/onetime/ce42f308eaa577b5/aparser.tar.gz
tar zxf aparser.tar.gz
rm -f aparser.tar.gz
  1. Create the file docker-compose.yml and place the following content in it:

    • Basicoption without password and port opening, Redis will only be available within the Docker network
    version: '3'

    services:
    a-parser:
    image: aparser/runtime:latest
    command: ./aparser
    restart: always
    volumes:
    - ./aparser:/app
    ports:
    - 9091:9091

    redis:
    image: redis:latest
    restart: always
    • Option with password and port opening, Redis will be available externally, so it is highly recommended to use a password
    version: '3'

    services:
    a-parser:
    image: aparser/runtime:latest
    command: ./aparser
    restart: always
    volumes:
    - ./aparser:/app
    ports:
    - 9091:9091

    redis:
    image: redis:latest
    restart: always
    command: redis-server --requirepass PASSWORD_FOR_REDIS_HERE
    ports:
    - 6379:6379

    Instead of PASSWORD_FOR_REDIS_HERE, invent and specify a password that will be used for authorization in Redis.

  2. Start the containers:

docker compose up -d

If A-Parser has already been run previously via docker-compose

  1. Edit the file docker-compose.yml adding the following content at the end:

    • Basicoption without password and port opening, Redis will only be available within the Docker network
      redis:
    image: redis:latest
    restart: always
    • Option with password and port opening, Redis will be available externally, so it is highly recommended to use a password
      redis:
    image: redis:latest
    restart: always
    command: redis-server --requirepass PASSWORD_FOR_REDIS_HERE
    ports:
    - 6379:6379

    Instead of PASSWORD_FOR_REDIS_HERE, invent and specify a password that will be used for authorization in Redis.

  2. Start the containers:

docker compose up -d
note

If A-Parser has already been launched and its configuration has not changed, it will not be restarted, and Docker will simply add and start Redis.

Executing Requests

The Redis API operation is based onRedis Lists (lists), list operations allow adding an unlimited number of requests to the queue (limited by RAM), as well as receiving results in blocking mode with a timeout (blpop) or in asynchronous mode (lpop).

  • all settings exceptuseproxy, proxyChecker andproxybannedcleanup are taken from the preset of the calling scraper+ overrideOpts
  • settingsuseproxy, proxyChecker andproxybannedcleanup are taken from the presetAPI::Server::RedisAPI::Server::Redis + overrideOpts

The request is added to Redis with the commandlpush, each request consists of an array[queryId, parser, preset, query, overrideOpts, apiOpts] serialized usingJSON:

  • parser, preset, query corresponds to the analogues for the API requestoneRequest
  • queryId - is formed together with the request, we recommend using the sequence number from your database or a good random value; the result can be obtained using this ID
  • overrideOpts - request settings override for the scraper preset
  • apiOpts - additional API processing parameters
note

When requesting through Redis, the result formatting stage is skipped, as the entire result is transmitted in JSON for subsequent programmatic processing.

redis-cli

Example of executing requests, for testing you can useredis-cli:

127.0.0.1:6379> lpush aparser_redis_api '["some_unique_id", "Net::HTTP", "default", "https://ya.ru"]'
(integer) 1
127.0.0.1:6379> blpop aparser_redis_api:some_unique_id 0
1) "aparser_redis_api:some_unique_id"
2) "{\"data\":\"<!DOCTYPE html><html.....

Various Use Cases

Asynchronous Check for the Result

lpop aparser_redis_api:some_unique_id

Will return the result if it has already been processed or nil if the request is still being processed

Blocking Result Retrieval

blpop aparser_redis_api:some_unique_id 0

This request will be blocked until the result is received, you can also specify the maximum timeout for receiving the result, after which the command will returnnil

Saving Results to a Single Queue

By default, A-Parser saves the result for each request under its unique keyaparser_redis_api:query_id, which allows organizing multithreaded processing, sending requests and receiving results separately for each thread

In some cases, it is necessary to process results in a single thread as they arrive, in which case it is convenient to save results to a single results queue (the key must differ from the key for requests)

To do this, you need to specify the output_queue key for apiOpts:

lpush aparser_redis_api '["some_unique_id", "Net::HTTP", "default", "https://ya.ru", {}, {"output_queue": "aparser_results"}]'

Retrieving the result from the shared queue:

127.0.0.1:6379> blpop aparser_results 0
1) "aparser_results"
2) "{\"queryId\":\"some_unique_id\",\"results\":{\"data\":\"<!DOCTYPE html><html class=...

Implementation Example (SpySERP Use Case)

Suppose we are creating aSaaS service that evaluates domain parameters; for simplicity, we will check the domain registration date

Our service consists of 2 pages:

  • /index.php - landing page with the domain input form
  • /results.php?domain=google.com - page with service operation results

To improve the user experience, we want our service pages to load instantly, and the data waiting process to look natural and display a loader

When requestingresults.php we first execute a request to A-Parser Redis API, forming a unique request_id:

​lpush aparser_redis_api '["request-1", "Net::Whois", "default", "google.com", {}, {}]'

After that, we can display the page to the user and show a loader in the data display area; due to the absence of delays, the server response will be limited only by the Redis connection speed (usually within 10ms)

A-Parser will start processing the request even before the user's browser receives the first content, after the browser loads all the necessary resources and scripts, we can display the result, for this we send aAJAX request to retrieve data:

/get-results.php?request_id=request-1

The get-results.php script performs a blocking request to Redis with a 15-second timeout:

blpop aparser_redis_api:request-1 15

And returns the response immediately as soon as it is received from A-Parser; if we receive a null result due to a timeout, we can display a data retrieval error to the user

Thus, by sending a request to A-Parser when the page is first opened (/results.php) we reduce the necessary waiting time for data for the user (/get-results.php) by the time the user's browser spends waiting for content, loading scripts, and executing theAJAX request