Using regular expressions
Overview(top)In A-Parser are applied Perl-compatible regular expressions which can be used:
- When parsing arbitrary information from any sites
- In Query builder for extraction or changeover part of request
- In Results builder for conversion of any results
- When using filters
- When checking accessibility of following page inparser Net::HTTP
Detailed documentation on the regular expressions can be found in the following sources:
- Regular expressions on WikiPedia
- Universal encyclopedia of the regular expressions of PCRE standard
- Topic at a forum
Features of use in A-Parser(top)
- Regular expressions register without delimiters //
- Are supported following flags:
- i - case-insensitive search for characters
- s - the point includes all characters, including line breaks
- g - global search or changeover
Starting with version 1.2.78, the Regular Expression Designer has been added.
You can find it on the Tools tab -> Regular Expression Builder
You can also send the received page code directly in the Test parsing. To do this, you must enable the debug mode and click the Go to RegEx Builder link.
In the builder, it is possible to choose a programming language in which the received regular expressions will be used.
To work with the builder, you need to insert the source text in the left field (or it will be inserted automatically from the Test parsing when you go to Go to Regex Builder). On the right, we adjust the parameters of the future regular expression.
To compose a simple Regular expression (for example, to get a title) it is enough to specify the necessary elements of the regular expression.
- In the Before group field, we enter the characters that are before the information we need
- In the After group field, we enter characters that are after the required data
- In the field the Group begins with the characters from which the requested string should begin
- In the field the Group ends with the characters that should be at the end of the search string
As you can see in the screenshot above, we compose a regular expression that will select the title of the site. Before the group we put <title> and after the group </ title>, and also, for example, we indicate that the search string begins with the letter T.
To fully test the received regular expression, it is possible to include the necessary flags: g, s and i.
You can also make more complex regular expressions, in which 2 or more groups.
For example, try to create a regular expression to collect all references and anchors in the list (<li>). To do this, we need to include the g flag and add another search group, since the first group will have links, and the second will have anchors.
Having specified the necessary parameters for both groups, we get a regular expression.
After the regular expression is executed, the result of its work is displayed below: the full string and the captured groups. When double clicking on any item in the result table, the initial text is scrolled to the location of the given match.