Skip to main content

Using Regular Expressions

General Information

A-Parser uses Perl/JavaScript-compatible regular expressions that can be used:

  • When parsing arbitrary information from any websites
  • In the query builder to extract or replace part of the query
  • In the results builder to transform any results
  • When using filters
  • When checking the availability of the next page in the Net::HTTPNet::HTTP scraper

Detailed documentation on regular expressions can be found in the following sources:

In A-Parser, it is possible to process any result using a regular expression, for this, the option Parse custom result is used:

Using the Use regular expression option

Usage Specifics and Flags

  • Regular expressions are written without delimiters //
  • The following flags are supported:
    • i - case-insensitive search
    • s - the dot includes all characters, including line breaks
    • g - global search or replacement

Additionally, it is possible to specify a flag in the regular expression itself, for example, searching for the word test in each line of the entire text (or page code, depending on what the regular expression is applied to) using the m (multi-line - the ^ and $ characters work as the beginning/end of the line):

(?m)^(.+?test.+?)$

Extraction of Any Information

Description of working with regular expressions in A-Parser

With the help of the Parse custom results option or the Results builder, regular expressions can be used to extract arbitrary information from parsing results, for example, from the source code of HTML pages or from already prepared results.

  • The result from the parser is selected as Parse result, it can be simple result or array
  • The regular expression is specified without delimiters, followed by the possibility of specifying a flag
  • The Result type specifies the result type - Flat (simple result) or Array (array). If an array is selected as the source result or the g flag of the regular expression is used, the result will always be saved in an array. The Name field specifies the name of the array
  • Each capturing bracket of the regular expression can be saved as a separate element, the name of the element is recorded in the corresponding field $1 to, $2 to... - where the number denotes the number of the capturing bracket
  • In the RegEx field, you can use the template engine, which allows you to use the request as part of the regular expression

The newly created results can be used in result formatting, in the results builder, in filtering, and deduplication of results or in the next Parse custom result option. This option is similar to the results builder when using RegEx Match.

Example preset of parsing links to images from source HTML code

To solve this task, we use the Net::HTTPNet::HTTP scraper to get the source code of the page. We apply the regular expression with the isg flags to $data (downloaded page), save the result in the images array in the src element. In the result format, we specify to output all src elements separated by a line break.

As a result of parsing, for the request http://a-parser.com/, we will get the following list in the result file:

/img/lang/en.png  
/img/lang/ru.png
img/[email protected]
https://files.a-parser.com/img/site/tour_ru/V1qpV.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_1_all_parsers_list.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_1_quick_task.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_2_task_editor_easy.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_3_task_editor_analyze_domains.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_4_task_editor_parse_emails.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_5_queue_fast_google.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_6_queue_spyserp.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_7_javascript_parser.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_8_scheduler.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_9_settings.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_10_proxies.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_11_templates.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_12_task_tester.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_13_parser_test.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_14_api.png
https://files.a-parser.com/img/site/tour_ru/tour_ru_15_resources.png
data/avatars/s/0/12.jpg?1507557563
data/avatars/s/0/12.jpg?1507557563
data/avatars/s/13/13392.jpg?1570706020
data/avatars/s/16/16560.jpg?1586782475
data/avatars/s/1/1240.jpg?1537376153
styles/uix/xenforo/avatars/avatar_s.png
data/avatars/s/0/371.jpg?1412969226
styles/uix/xenforo/avatars/avatar_s.png
//mc.yandex.ru/watch/26891250
Download example

How to import the example into A-Parser


eJxtVN9v2jAQ/l8sJArqYH3YS7Stokhomxgwmj5BJlnkyLz612yHFUX533d2Egfa
vYDv7rvvvvNdXBFH7bPdGLDgLEl2FdHhTBKSw5GW3JFboqmxYHx4R1bgkuRLmm7Q
HxEVcWcNmHMorVNiC7ZJNM0BuaijwS7gBc2PTBS7n5+zsTWH/d6OP/mf3XBPspvJ
+H4UTh08bZiZLSJh66LG0DM6w/+KigATtAAbkV4zwSIkq3uR6gTGsBwQxXK0j8oI
6kwn+kR56WGDhmvShG+GgyBWDkekzrJYYBGiHq7vJu3dxeAjPUGqfAnGoXcv0Gr1
DvBmwEe7MqOJe/EMNM+ZY0pS3lTwnfRVnyT7E0RKhVg8GgZ2YZRAl4NA4J3nTt2O
DIJNkKIMuT+aHJIcKbdwSyxKXVAUkr+OMAeGOmXW2utBf0WUnHG+hBPwHhb4H0rG
c1yV2RGTvraJ/4es33DUsb3LUjisvwY1RJZgPay/91m5WqqiuwzOBHNo27kqpR/M
e3Q+A+h4ZysPE8pALNMyt9Xxa9Ag/Wb0I5vp3nXVxtVYrp0HJY+sWLfb1iFLmeIn
t5ZzJTQH35csOcexWNj26zGz7Ri80Qt8nTwPJa4+VqcUt98eG6naMFy/D16gwJu8
rNpSHijnT9vlZYT0K4XGL+e0TaZT+q55BiYHJabEJzooFK4UtlVn8ZGIT0l18VQk
VY1j+m03Dcb35BHow8uxOAOS3NX/AFJvlP8=

Regular Expression Builder

Starting from version 1.2.78, a Regular Expression Builder has been added.

You can find it on the Tools tab -> Regular Expression Builder. You can also send the obtained page code directly to the Test scraper. To do this, you need to enable the debug mode and click on the Go to RegEx Builder link.

getting the page code and opening the regular expression builder

In the builder, you can choose the programming language in which the obtained regular expressions will be used.

To work with the builder, you need to insert the source text into the left field (or it will be inserted automatically from the Test scraper when you go to the RegEx Builder). On the right, configure the parameters of the future regular expression.

To create a simple regular expression (for example, to get the title), it is enough to specify the necessary elements of the regular expression.

  • In the Before group field, enter the characters that precede the information we need
  • In the After group field, enter the characters that follow the desired data
  • In the Group starts with field, specify the characters with which the desired string should begin
  • In the Group ends with field, specify the characters that should be at the end of the desired string

example of getting the title using the regular expression builder

As can be seen in the screenshot above, we are creating a regular expression that will select the title of the site. We will put <title> before the group and </title> after the group, and also, for example, indicate that the desired string starts with the letter A.

To fully test the obtained regular expression, it is possible to enable the necessary flags: g, s, and i.

It is also possible to create more complex regular expressions with 2 or more groups. For example, let's try to create a regular expression to collect all links and anchors in a list (<li>). To do this, we need to enable the g flag and add another search group, since the first group will contain links, and the second will contain anchors.

example of using groups in the regular expression builder

After setting the necessary parameters for both groups, we get the regular expression:

<li><a href="(.+?)">(.+?)<\/a

To check the regular expression created, click the Test button

checking the created regex in the regular expression builder

After executing the regular expression, the result of its work is displayed at the bottom: the full string and the captured groups. Double-clicking on any element in the result table scrolls the initial text to the location of this match.

Regular expressions for the little ones

My name is Vitaly Kotov and I know a little about regular expressions. Under the cut, I will tell you the basics of working with them...

Regular expressions (regexp) - basics

Regular expressions are a mechanism for finding and replacing text. In a string, file, multiple files...

⏩Parsing industrial equipment catalog

An example of using regular expressions in parsing an industrial equipment catalog

⏩Parsing the Booking.com resource

An example of using regular expressions in parsing the Booking.com resource

⏩Finding contact pages

An example of using regular expressions in parsing contact pages