Skip to main content

HTML::EmailExtractor - Parsing email addresses from website pages

HTML::EmailExtractor parser overview

Overview

HTML::EmailExtractorHTML::EmailExtractor collects email addresses from specified pages. It supports navigating through internal pages of a website up to a specified depth, allowing you to go through all pages of a website, collecting internal and external links. The email parser has built-in tools for bypassing CloudFlare protection and also the ability to choose Chrome as the engine for parsing emails from pages where data is loaded by scripts. It can develop a speed of up to 250 requests per minute - that's 15,000 links per hour.

Use cases for HTML::EmailExtractor email parser

Parsing emails from a website with navigation to pages up to a specified limit

Case 1

  1. Add the option Parse to level, select the required value (limit) from the list.
  2. In the Requests section, check the option Unique requests.
  3. In the Results section, check the option Unique by string.
  4. Specify the link to the website from which you need to parse emails as a request.
Download example

How to import an example into A-Parser

eJxtU01z2jAQ/S8aDu0MY5pDL74RJkzTIXGakBPDQYPXREWWVEmGpB7+e98Kx4Ym
N+3u2/f2S62IMuzCg6dAMYh81QqX3iIXJVWy0VGMhZM+kOfwSvxY3i3y/KaWSt+8
Ri830XpAenAr4psjpFsXlTUBMVXCTBwL2pOGZy91A8zVcb0eC+ghM8ytryXrjtxV
1hXRB5/knpYWwUppGtxzWPeyZrlRKSNxNKsS0ZevWXxlBlmWiiuR+qTAbQyqz0b9
4VJEiF6ZLfAwvaIw97aGO1IiYefbe4UrMUq2AE2T8n+dckQefUNjEVDtHAOisg9U
UgdEVCQvMbGiG07eCmumWqfBDLBEf90oXWLs0wpJt13i55DiA8ex7/Bcak/+4FFD
z5Ks6+JuyCrtwm7RuLFoW6taRdhhZhvDu/kG547I9WO7Z1htPfUyHXOnjstyZPgA
hq1N3eC6aONiM5fOjTWV2hZowKuS3pGNWeJ8CzOztdPEfZlGa2wl0ONwIdPQrYGN
ocD/k2dJ4uLwo7U6/Hw6leq8wgV+5wJrTPJctaPcSK2fHxfnETFcFIyXGF3IJ5PD
4ZDt/taBl5r5ZiI4N9LW4qjQ2XHd/7n+Z7af/7y8PWJpv8PDCc4dMhg+jCpgI/zL
/gFm02Dr

See also:

Parsing emails from a website base with navigation of each website to a depth up to a specified limit

Case 2

  1. Add the option Parse to level, select the required value (limit) from the list.
  2. In the Requests section, check the option Unique requests.
  3. In the Results section, check the option Unique by string.
  4. Specify the links to the websites from which you need to parse emails as a request, or in Requests from specify File and upload a request file with a website base.
Download example

How to import an example into A-Parser

eJxtU01z2jAQ/S8aDu0MY5pDL74RJkzTIXGakBPDQYPXREWWVEmGpB7+e98Kx4Ym
N+3u2/f2S62IMuzCg6dAMYh81QqX3iIXJVWy0VGMhZM+kOfwSvxY3i3y/KaWSt+8
Ri830XpAenAr4psjpFsXlTUBMVXCTBwL2pOGZy91A8zVcb0eC+ghM8ytryXrjtxV
1hXRB5/knpYWwUppGtxzWPeyZrlRKSNxNKsS0ZevWXxlBlmWiiuR+qTAbQyqz0b9
4VJEiF6ZLfAwvaIw97aGO1IiYefbe4UrMUq2AE2T8n+dckQefUNjEVDtHAOisg9U
UgdEVCQvMbGiG07eCmumWqfBDLBEf90oXWLs0wpJt13i55DiA8ex7/Bcak/+4FFD
z5Ks6+JuyCrtwm7RuLFoW6taRdhhZhvDu/kG547I9WO7Z1htPfUyHXOnjstyZPgA
hq1N3eC6aONiM5fOjTWV2hZowKuS3pGNWeJ8CzOztdPEfZlGa2wl0ONwIdPQrYGN
ocD/k2dJ4uLwo7U6/Hw6leq8wgV+5wJrTPJctaPcSK2fHxfnETFcFIyXGF3IJ5PD
4ZDt/taBl5r5ZiI4N9LW4qjQ2XHd/7n+Z7af/7y8PWJpv8PDCc4dMhg+jCpgI/zL
/gFm02Dr

See also:

Case 3

  1. In the Requests section, check the option Unique requests.
  2. In the Results section, check the option Unique by string.
  3. Specify the links from which you need to parse emails as a request, or in Requests from specify File and upload a request file with a links base.
Download example

How to import an example into A-Parser

eJxtU01z0zAQ/S+aHmAmOPTAxbc00wwwaV3a9BRyEPE6COuLXSkpePLfWTmOHZfe
tG/fvv1UI4Kkmh4QCAKJfN0I375FLkqoZNRBTISXSIDJvRafV3fLPL81Uunbl4By
Gxwy5UzebCaCBfhJC4dGJqErf511qr3zSe5h5dhZKQ0DvGDrXhpIUaUMkLxZ1Qq9
e5+Fl6Qgy1IF5azUpwypriHrs1W/Y4qngMrumM8mKqAFOsNwgFYkgX/OFa7FVWsL
lolt/LdTjMgDRpgI4moX3DGUvaOSmtijAqDkERQ+lcR4I5ydab2EPeiB1srfRKVL
nuOs4qAvXeDblOI/jWPf4WWqPeABuYZepbVuirshqnRLt+PGreO2tTIqsE1zF23a
zUcGawDfj+0+0YxD6NN0yl12PhUPtmTmsLWZH6BRG6PNjMGts5XaFdwAqhLOzGhX
fI+FnTvjNaS+bNSat0LwOFzIjLo1JGMo8HXwvE0xuuTgnKavT6dSPSq+wE+pQMOT
vMzaSW6l1s+Py0uPGC6KjZ8heMqn08PhkNV/DaWlZhin3+3Z8wMl4Bjy6Mq4DVuw
4bXLOKpZwoxRqSv5IUBNY5hMpqkVEKnUADvHN8yDPG76P9v/7Obtn5s3R76RX/Rw
oqeBJjJjvBniAxD59fEfH7B6cg==

See also:

List of data collected by the email parser

Example of collected data

  • Email addresses
  • Total number of addresses on the page
  • Array with all collected pages (used when using the Use Pages option)

Features

  • Multi-page parsing (page navigation)
  • Determination of follow links for links
  • Navigation through internal site pages to a specified depth (option Parse to level) - allows you to go through all site pages, collecting internal and external links
  • Ability to consider subdomains as internal site pages
  • Supports gzip/deflate/brotli compression
  • Determination and conversion of site encodings to UTF-8
  • Bypassing CloudFlare protection
  • Choice of engine (HTTP or Chrome)
  • Supports all functionality of HTML::LinkExtractorHTML::LinkExtractor

Use cases

  • Parsing email addresses
  • Outputting the number of email addresses

Query examples

Queries must be links to pages, for example:

https://a-parser.com/pages/support/

Possible output formats

A-Parser supports flexible result formatting thanks to the built-in Template Toolkit template engine, which allows it to output results in any form, as well as in a structured form, such as CSV or JSON.

Outputting the number of email addresses

Result format:

$mailcount

Example result:

4

Possible settings

Parameter nameDefault valueDescription
Good statusAllChoose which server response will be considered successful. If parsing receives a different response from the server, the request will be repeated with a different proxy
Good code RegEx-Ability to specify a regular expression to check the response code
Ban Proxy Code RegEx-Ability to ban a proxy for a certain time (Proxy ban time) based on the server response code
MethodGETRequest method
POST body-Content to be sent to the server when using the POST method. Supports variables $query - URL query, $query.orig - original query, and $pagenum - page number when using the Use Pages option.
Cookies-Ability to specify cookies for the request.
User agentThe user-agent of the current version of Chrome is automatically insertedUser-Agent header when requesting pages
Additional headers-Ability to specify arbitrary request headers with support for template engine features and use of variables from the request builder
Read only headersRead only headers. In some cases, it allows you to save traffic if there is no need to process content
Detect charset on contentRecognize the encoding based on the page content
Emulate browser headersEmulate browser headers
Max redirects count0Maximum number of redirects that the parser will follow
Follow common redirectsAllows http <-> https and www.domain <-> domain redirects within the same domain, bypassing the Max redirects count limit
Max cookies count16Maximum number of cookies to be saved
EngineHTTP (Fast, JavaScript Disabled)Allows you to choose the HTTP engine (faster, without JavaScript) or Chrome (slower, with JavaScript)
Chrome HeadlessIf enabled, the browser will not be displayed
Chrome DevToolsAllows you to use Chromium debugging tools
Chrome Log Proxy connectionsIf enabled, information about chrome connections will be output to the log
Chrome Wait Untilnetworkidle2Determines when the page is considered loaded. More about values.
Use HTTP/2 transportDetermines whether to use HTTP/2 instead of HTTP/1.1. For example, Google and Majestic immediately ban if you use HTTP/1.1.
Don't verify TLS certsDisabling TLS certificate validation
Randomize TLS FingerprintThis option allows you to bypass site bans by TLS fingerprint
Bypass CloudFlareAutomatic bypass of CloudFlare verification
Bypass CloudFlare with Chrome(Experimental)Bypass CF through Chrome
Bypass CloudFlare with Chrome Max Pages20Max. number of pages when bypassing CF through Chrome
Subdomains are internalWhether to consider subdomains as internal links
Follow linksInternal onlyWhich links to follow
Skip comment blocksWhether to skip comment blocks
Search Cloudflare protected e-mailsWhether to parse Cloudflare protected e-mails.
Skip non-HTML blocksDo not collect email addresses in tags (script, style, comment, etc.).
Skip meta tagsDo not collect email addresses in meta tags.