Skip to main content

Using Regular Expressions

General Information

Perl/JavaScript-compatible regular expressions are used in A-Parser , which can be used:

Detailed documentation on regular expressions can be found in the following sources:

In A-Parser it is possible to process any result using a regular expression, for this the option Use RegEx:

Option to Parse custom result

Usage Specifics and Flags

  • Regular expressions are written without delimiters //
  • The following flags are supported:
    • i - case-insensitive search
    • g - global search or replacement

Extraction of Arbitrary Information

  • In Result Type, the result type is specified: Flat (simple result) or Array (array). If an array is selected as the source result or the g flag of the regular expression is used, the result will always be saved in an array. The array name is specified in the Name field.
(?m)^(.+?test.+?)$

Extracting Arbitrary Information

Description of working with regular expressions in the task editor

Using the option Use RegEx or the Results Constructor it is possible to use regular expressions to extract arbitrary information from scraping results, for example from the source HTML code of pages or from already prepared results

  • As Apply to , the result from the scraper is selected, which can be a simple result or an array
  • The regular expression is specified without delimiters, followed by the option to specify a flag
  • Result Type specifies the result type: Flat (simple result) or Array (array). If an array is selected as the source result or the g flag of the regular expression is used, the result will always be saved as an array. The Name field specifies the array name.
  • Each capturing group of the regular expression can be saved as a separate element; the element name is written in the corresponding field $1 to, $2 to... - where the digit denotes the number of the capturing group
  • In the RegEx field you can use the templater, which allows using the query as part of the regular expression

The newly created results can be used for result formatting, in the Results Constructor, , in filtering and deduplication of results or in the next Use RegEx. This option is similar to the results builder when using RegEx Match

Example of parsing links to images from the source HTML code

To solve this problem, we use the Net::HTTPNet::HTTP scraper to get the source code of the page. We apply to $data (#) a regular expression with flags isg, , and save the result to the elements src of the array images. In the result format, we specify that all src elements should be output separated by a newline.

As a result of the scraping for the query http://a-parser.com/, we will get the following list in the result file:

/img/lang/en.png  
/img/lang/ru.png
img/[email protected]
https://files.a-parser.com/img/site/tour_ru/V1qpV.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_1_all_parsers_list.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_1_quick_task.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_2_task_editor_easy.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_3_task_editor_analyze_domains.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_4_task_editor_parse_emails.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_5_queue_fast_google.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_6_queue_spyserp.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_7_javascript_parser.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_8_scheduler.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_9_settings.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_10_proxies.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_11_templates.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_12_task_tester.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_13_parser_test.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_14_api.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_15_resources.png
data/avatars/s/0/12.jpg?1507557563
data/avatars/s/0/12.jpg?1507557563
data/avatars/s/13/13392.jpg?1570706020
data/avatars/s/16/16560.jpg?1586782475
data/avatars/s/1/1240.jpg?1537376153
styles/uix/xenforo/avatars/avatar_s.png
data/avatars/s/0/371.jpg?1412969226
styles/uix/xenforo/avatars/avatar_s.png
//mc.yandex.ru/watch/26891250
Download example

How to import the example into A-Parser


eJxtVN9v2jAQ/l8sJArqYH3YS7Stokhomxgwmj5BJlnkyLz612yHFUX533d2Egfa
vYDv7rvvvvNdXBFH7bPdGLDgLEl2FdHhTBKSw5GW3JFboqmxYHx4R1bgkuRLmm7Q
HxEVcWcNmHMorVNiC7ZJNM0BuaijwS7gBc2PTBS7n5+zsTWH/d6OP/mf3XBPspvJ
+H4UTh08bZiZLSJh66LG0DM6w/+KigATtAAbkV4zwSIkq3uR6gTGsBwQxXK0j8oI
6kwn+kR56WGDhmvShG+GgyBWDkekzrJYYBGiHq7vJu3dxeAjPUGqfAnGoXcv0Gr1
DvBmwEe7MqOJe/EMNM+ZY0pS3lTwnfRVnyT7E0RKhVg8GgZ2YZRAl4NA4J3nTt2O
DIJNkKIMuT+aHJIcKbdwSyxKXVAUkr+OMAeGOmXW2utBf0WUnHG+hBPwHhb4H0rG
c1yV2RGTvraJ/4es33DUsb3LUjisvwY1RJZgPay/91m5WqqiuwzOBHNo27kqpR/M
e3Q+A+h4ZysPE8pALNMyt9Xxa9Ag/Wb0I5vp3nXVxtVYrp0HJY+sWLfb1iFLmeIn
t5ZzJTQH35csOcexWNj26zGz7Ri80Qt8nTwPJa4+VqcUt98eG6naMFy/D16gwJu8
rNpSHijnT9vlZYT0K4XGL+e0TaZT+q55BiYHJabEJzooFK4UtlVn8ZGIT0l18VQk
VY1j+m03Dcb35BHow8uxOAOS3NX/AFJvlP8=

Regular Expression Constructor

The Regular Expression Constructor was added starting from version 1.2.78.

You can find it on the Tools tab -> Regular Expression Constructor. You can also send the received page code directly in Test Scraping. To do this, you need to enable debug mode and click on the Go to RegEx Builder link.

Open the page code in the regular expression constructor

The constructor allows you to select the programming language in which the resulting regular expressions will be used.

To work with the constructor, you need to insert the source text into the left field (or it will be inserted automatically from Test Scraping when transitioning via Go to Regex Builder). On the right, you configure the parameters of the future regular expression.

To create a simple Regular Expression (for example, to get the title) it is enough to specify the necessary elements of the regular expression.

  • In the Before Group field, enter the characters that precede the information we need
  • In the After Group field, enter the characters that follow the required data
  • In the Group Starts With field, specify the characters with which the desired string should begin
  • In the Group Ends With field, specify the characters that should be at the end of the desired string
Example of getting a title using a regular expression constructor

As seen in the screenshot above, we create a regular expression that will select the site's title. Before the group we put <title> and after the group </title>, , and also, for example, indicate that the desired string starts with the letter W.

For full testing of the resulting regular expression, it is possible to enable the necessary flags: g, s and i.

It is also possible to create more complex regular expressions with 2 or more groups. For example, let's try to create a regular expression to collect all links and anchors in the list <li>. For this, we need to enable the flag g and add another search group, as the first group will contain links, and the second will contain anchors.

Example of using groups in the regular expression constructor

By setting the necessary parameters for both groups, we get the regular expression:

<li><a href="(.+?)">(.+?)<\/a

To test the regular expression, click the Test button:

Checking the created regular expression in the regular expression constructor

After the regular expression is executed, the result of its work is displayed at the bottom: the full string and the captured groups. Double-clicking on any element in the result table scrolls the initial text to the location of that match.