HTML::EmailExtractor - Parsing email addresses from website pages
HTML::EmailExtractor parser overview

Use cases for HTML::EmailExtractor email parser
Parsing emails from a website with navigation to pages up to a specified limit
- Add the option
Parse to level
, select the required value (limit) from the list. - In the
Requests
section, check the optionUnique requests
. - In the
Results
section, check the optionUnique by string
. - Specify the link to the website from which you need to parse emails as a request.
Download example
How to import an example into A-Parser
eJxtU01z2jAQ/S8aDu0MY5pDL74RJkzTIXGakBPDQYPXREWWVEmGpB7+e98Kx4Ym
N+3u2/f2S62IMuzCg6dAMYh81QqX3iIXJVWy0VGMhZM+kOfwSvxY3i3y/KaWSt+8
Ri830XpAenAr4psjpFsXlTUBMVXCTBwL2pOGZy91A8zVcb0eC+ghM8ytryXrjtxV
1hXRB5/knpYWwUppGtxzWPeyZrlRKSNxNKsS0ZevWXxlBlmWiiuR+qTAbQyqz0b9
4VJEiF6ZLfAwvaIw97aGO1IiYefbe4UrMUq2AE2T8n+dckQefUNjEVDtHAOisg9U
UgdEVCQvMbGiG07eCmumWqfBDLBEf90oXWLs0wpJt13i55DiA8ex7/Bcak/+4FFD
z5Ks6+JuyCrtwm7RuLFoW6taRdhhZhvDu/kG547I9WO7Z1htPfUyHXOnjstyZPgA
hq1N3eC6aONiM5fOjTWV2hZowKuS3pGNWeJ8CzOztdPEfZlGa2wl0ONwIdPQrYGN
ocD/k2dJ4uLwo7U6/Hw6leq8wgV+5wJrTPJctaPcSK2fHxfnETFcFIyXGF3IJ5PD
4ZDt/taBl5r5ZiI4N9LW4qjQ2XHd/7n+Z7af/7y8PWJpv8PDCc4dMhg+jCpgI/zL
/gFm02Dr
See also:
Parsing emails from a website base with navigation of each website to a depth up to a specified limit
- Add the option
Parse to level
, select the required value (limit) from the list. - In the
Requests
section, check the optionUnique requests
. - In the
Results
section, check the optionUnique by string
. - Specify the links to the websites from which you need to parse emails as a request, or in
Requests from
specifyFile
and upload a request file with a website base.
Download example
How to import an example into A-Parser
eJxtU01z2jAQ/S8aDu0MY5pDL74RJkzTIXGakBPDQYPXREWWVEmGpB7+e98Kx4Ym
N+3u2/f2S62IMuzCg6dAMYh81QqX3iIXJVWy0VGMhZM+kOfwSvxY3i3y/KaWSt+8
Ri830XpAenAr4psjpFsXlTUBMVXCTBwL2pOGZy91A8zVcb0eC+ghM8ytryXrjtxV
1hXRB5/knpYWwUppGtxzWPeyZrlRKSNxNKsS0ZevWXxlBlmWiiuR+qTAbQyqz0b9
4VJEiF6ZLfAwvaIw97aGO1IiYefbe4UrMUq2AE2T8n+dckQefUNjEVDtHAOisg9U
UgdEVCQvMbGiG07eCmumWqfBDLBEf90oXWLs0wpJt13i55DiA8ex7/Bcak/+4FFD
z5Ks6+JuyCrtwm7RuLFoW6taRdhhZhvDu/kG547I9WO7Z1htPfUyHXOnjstyZPgA
hq1N3eC6aONiM5fOjTWV2hZowKuS3pGNWeJ8CzOztdPEfZlGa2wl0ONwIdPQrYGN
ocD/k2dJ4uLwo7U6/Hw6leq8wgV+5wJrTPJctaPcSK2fHxfnETFcFIyXGF3IJ5PD
4ZDt/taBl5r5ZiI4N9LW4qjQ2XHd/7n+Z7af/7y8PWJpv8PDCc4dMhg+jCpgI/zL
/gFm02Dr
See also:
Parsing emails from a links base
- In the
Requests
section, check the optionUnique requests
. - In the
Results
section, check the optionUnique by string
. - Specify the links from which you need to parse emails as a request, or in
Requests from
specifyFile
and upload a request file with a links base.
Download example
How to import an example into A-Parser
eJxtU01z0zAQ/S+aHmAmOPTAxbc00wwwaV3a9BRyEPE6COuLXSkpePLfWTmOHZfe
tG/fvv1UI4Kkmh4QCAKJfN0I375FLkqoZNRBTISXSIDJvRafV3fLPL81Uunbl4By
Gxwy5UzebCaCBfhJC4dGJqErf511qr3zSe5h5dhZKQ0DvGDrXhpIUaUMkLxZ1Qq9
e5+Fl6Qgy1IF5azUpwypriHrs1W/Y4qngMrumM8mKqAFOsNwgFYkgX/OFa7FVWsL
lolt/LdTjMgDRpgI4moX3DGUvaOSmtijAqDkERQ+lcR4I5ydab2EPeiB1srfRKVL
nuOs4qAvXeDblOI/jWPf4WWqPeABuYZepbVuirshqnRLt+PGreO2tTIqsE1zF23a
zUcGawDfj+0+0YxD6NN0yl12PhUPtmTmsLWZH6BRG6PNjMGts5XaFdwAqhLOzGhX
fI+FnTvjNaS+bNSat0LwOFzIjLo1JGMo8HXwvE0xuuTgnKavT6dSPSq+wE+pQMOT
vMzaSW6l1s+Py0uPGC6KjZ8heMqn08PhkNV/DaWlZhin3+3Z8wMl4Bjy6Mq4DVuw
4bXLOKpZwoxRqSv5IUBNY5hMpqkVEKnUADvHN8yDPG76P9v/7Obtn5s3R76RX/Rw
oqeBJjJjvBniAxD59fEfH7B6cg==
See also:
List of data collected by the email parser
- Email addresses
- Total number of addresses on the page
- Array with all collected pages (used when using the Use Pages option)
Features
- Multi-page parsing (page navigation)
- Determination of follow links for links
- Navigation through internal site pages to a specified depth (option Parse to level) - allows you to go through all site pages, collecting internal and external links
- Ability to consider subdomains as internal site pages
- Supports gzip/deflate/brotli compression
- Determination and conversion of site encodings to UTF-8
- Bypassing CloudFlare protection
- Choice of engine (HTTP or Chrome)
- Supports all functionality of
HTML::LinkExtractor
Use cases
- Parsing email addresses
- Outputting the number of email addresses
Query examples
Queries must be links to pages, for example:
https://a-parser.com/pages/support/
Possible output formats
A-Parser supports flexible result formatting thanks to the built-in Template Toolkit template engine, which allows it to output results in any form, as well as in a structured form, such as CSV or JSON.
Outputting the number of email addresses
Result format:
$mailcount
Example result:
4
Possible settings
Parameter name | Default value | Description |
---|---|---|
Good status | All | Choose which server response will be considered successful. If parsing receives a different response from the server, the request will be repeated with a different proxy |
Good code RegEx | - | Ability to specify a regular expression to check the response code |
Ban Proxy Code RegEx | - | Ability to ban a proxy for a certain time (Proxy ban time) based on the server response code |
Method | GET | Request method |
POST body | - | Content to be sent to the server when using the POST method. Supports variables $query - URL query, $query.orig - original query, and $pagenum - page number when using the Use Pages option. |
Cookies | - | Ability to specify cookies for the request. |
User agent | The user-agent of the current version of Chrome is automatically inserted | User-Agent header when requesting pages |
Additional headers | - | Ability to specify arbitrary request headers with support for template engine features and use of variables from the request builder |
Read only headers | ☐ | Read only headers. In some cases, it allows you to save traffic if there is no need to process content |
Detect charset on content | ☐ | Recognize the encoding based on the page content |
Emulate browser headers | ☐ | Emulate browser headers |
Max redirects count | 0 | Maximum number of redirects that the parser will follow |
Follow common redirects | ☑ | Allows http <-> https and www.domain <-> domain redirects within the same domain, bypassing the Max redirects count limit |
Max cookies count | 16 | Maximum number of cookies to be saved |
Engine | HTTP (Fast, JavaScript Disabled) | Allows you to choose the HTTP engine (faster, without JavaScript) or Chrome (slower, with JavaScript) |
Chrome Headless | ☐ | If enabled, the browser will not be displayed |
Chrome DevTools | ☑ | Allows you to use Chromium debugging tools |
Chrome Log Proxy connections | ☑ | If enabled, information about chrome connections will be output to the log |
Chrome Wait Until | networkidle2 | Determines when the page is considered loaded. More about values. |
Use HTTP/2 transport | ☐ | Determines whether to use HTTP/2 instead of HTTP/1.1. For example, Google and Majestic immediately ban if you use HTTP/1.1. |
Don't verify TLS certs | ☐ | Disabling TLS certificate validation |
Randomize TLS Fingerprint | ☐ | This option allows you to bypass site bans by TLS fingerprint |
Bypass CloudFlare | ☑ | Automatic bypass of CloudFlare verification |
Bypass CloudFlare with Chrome(Experimental) | ☐ | Bypass CF through Chrome |
Bypass CloudFlare with Chrome Max Pages | 20 | Max. number of pages when bypassing CF through Chrome |
Subdomains are internal | ☐ | Whether to consider subdomains as internal links |
Follow links | Internal only | Which links to follow |
Skip comment blocks | ☐ | Whether to skip comment blocks |
Search Cloudflare protected e-mails | ☑ | Whether to parse Cloudflare protected e-mails. |
Skip non-HTML blocks | ☑ | Do not collect email addresses in tags (script, style, comment, etc.). |
Skip meta tags | ☐ | Do not collect email addresses in meta tags. |