HTML::EmailExtractor - Scraping email addresses from website pages

Overview of the scraper

HTML::EmailExtractor collects email addresses from specified pages. It supports navigating through internal pages of the website up to a specified depth, which allows it to go through all the pages of the site, collecting internal and external links. The email scraper has built-in means to bypass CloudFlare protection and also the ability to choose Chrome as the engine for scraping emails from pages where data is loaded by scripts. Capable of reaching speeds up to 250 requests per minute – that's 15,000 links per hour.

Go to DEMO Buy A-Parser Pro ($299)

Use cases for the scraper

Scraping emails from a website with page navigation deep into a specified limit

Add the option Parse to level, in the list select the necessary value (limit).
In the Requests section, check the Unique requests option.
In the Results section, check the Unique by line option.
As a request, specify the link to the website from which you need to scrape emails.

Download example

How to import an example into A-Parser

eJxtU01z2jAQ/S8aDu0MY5pDL74RJkzTIXGakBPDQYPXREWWVEmGpB7+e98Kx4Ym
N+3u2/f2S62IMuzCg6dAMYh81QqX3iIXJVWy0VGMhZM+kOfwSvxY3i3y/KaWSt+8
Ri830XpAenAr4psjpFsXlTUBMVXCTBwL2pOGZy91A8zVcb0eC+ghM8ytryXrjtxV
1hXRB5/knpYWwUppGtxzWPeyZrlRKSNxNKsS0ZevWXxlBlmWiiuR+qTAbQyqz0b9
4VJEiF6ZLfAwvaIw97aGO1IiYefbe4UrMUq2AE2T8n+dckQefUNjEVDtHAOisg9U
UgdEVCQvMbGiG07eCmumWqfBDLBEf90oXWLs0wpJt13i55DiA8ex7/Bcak/+4FFD
z5Ks6+JuyCrtwm7RuLFoW6taRdhhZhvDu/kG547I9WO7Z1htPfUyHXOnjstyZPgA
hq1N3eC6aONiM5fOjTWV2hZowKuS3pGNWeJ8CzOztdPEfZlGa2wl0ONwIdPQrYGN
ocD/k2dJ4uLwo7U6/Hw6leq8wgV+5wJrTPJctaPcSK2fHxfnETFcFIyXGF3IJ5PD
4ZDt/taBl5r5ZiI4N9LW4qjQ2XHd/7n+Z7af/7y8PWJpv8PDCc4dMhg+jCpgI/zL
/gFm02Dr

tip

Scraping emails from a database of websites with page navigation deep into a specified limit

Add the option Parse to level, in the list select the necessary value (limit).
In the Requests section, check the Unique requests option.
In the Results section, check the Unique by line option.
As a request, specify the links to the websites from which you need to scrape emails, or in Requests from select File and upload a file of requests with a database of websites.

Download example

How to import an example into A-Parser

eJxtU01z2jAQ/S8aDu0MY5pDL74RJkzTIXGakBPDQYPXREWWVEmGpB7+e98Kx4Ym
N+3u2/f2S62IMuzCg6dAMYh81QqX3iIXJVWy0VGMhZM+kOfwSvxY3i3y/KaWSt+8
Ri830XpAenAr4psjpFsXlTUBMVXCTBwL2pOGZy91A8zVcb0eC+ghM8ytryXrjtxV
1hXRB5/knpYWwUppGtxzWPeyZrlRKSNxNKsS0ZevWXxlBlmWiiuR+qTAbQyqz0b9
4VJEiF6ZLfAwvaIw97aGO1IiYefbe4UrMUq2AE2T8n+dckQefUNjEVDtHAOisg9U
UgdEVCQvMbGiG07eCmumWqfBDLBEf90oXWLs0wpJt13i55DiA8ex7/Bcak/+4FFD
z5Ks6+JuyCrtwm7RuLFoW6taRdhhZhvDu/kG547I9WO7Z1htPfUyHXOnjstyZPgA
hq1N3eC6aONiM5fOjTWV2hZowKuS3pGNWeJ8CzOztdPEfZlGa2wl0ONwIdPQrYGN
ocD/k2dJ4uLwo7U6/Hw6leq8wgV+5wJrTPJctaPcSK2fHxfnETFcFIyXGF3IJ5PD
4ZDt/taBl5r5ZiI4N9LW4qjQ2XHd/7n+Z7af/7y8PWJpv8PDCc4dMhg+jCpgI/zL
/gFm02Dr

tip

Scraping emails from a database of links

In the Queries section, check the Unique queries option.
In the Results section, check the Unique per line option.
As a query, specify the links from which you need to scrape emails, or in Queries from select File and upload a file with a link database.

Download example

How to import example into A-Parser

eJxtU01z0zAQ/S+aHmAmOPTAxbc00wwwaV3a9BRyEPE6COuLXSkpePLfWTmOHZfe
tG/fvv1UI4Kkmh4QCAKJfN0I375FLkqoZNRBTISXSIDJvRafV3fLPL81Uunbl4By
Gxwy5UzebCaCBfhJC4dGJqErf511qr3zSe5h5dhZKQ0DvGDrXhpIUaUMkLxZ1Qq9
e5+Fl6Qgy1IF5azUpwypriHrs1W/Y4qngMrumM8mKqAFOsNwgFYkgX/OFa7FVWsL
lolt/LdTjMgDRpgI4moX3DGUvaOSmtijAqDkERQ+lcR4I5ydab2EPeiB1srfRKVL
nuOs4qAvXeDblOI/jWPf4WWqPeABuYZepbVuirshqnRLt+PGreO2tTIqsE1zF23a
zUcGawDfj+0+0YxD6NN0yl12PhUPtmTmsLWZH6BRG6PNjMGts5XaFdwAqhLOzGhX
fI+FnTvjNaS+bNSat0LwOFzIjLo1JGMo8HXwvE0xuuTgnKavT6dSPSq+wE+pQMOT
vMzaSW6l1s+Py0uPGC6KjZ8heMqn08PhkNV/DaWlZhin3+3Z8wMl4Bjy6Mq4DVuw
4bXLOKpZwoxRqSv5IUBNY5hMpqkVEKnUADvHN8yDPG76P9v/7Obtn5s3R76RX/Rw
oqeBJjJjvBniAxD59fEfH7B6cg==

tip

Collected data

Example of collected data

Email addresses
Total number of addresses on the page
Array with all collected pages (used when Use Pages option is enabled)

Capabilities

Multi-page scraping (pagination)
Navigation through internal site pages up to a specified depth (option Parse to level) – allows to cover all site pages, collecting internal and external links
Determining follow links for links
Limit on page transitions (option Follow links limit)
Ability to consider subdomains as internal site pages
Supports gzip/deflate/brotli compression
Detection and conversion of site encodings to UTF-8
CloudFlare protection bypass
Choice of engine (HTTP or Chrome)
Supports all the functionality of HTML::LinkExtractor

Use cases

Email address scraping
Displaying the number of e-mail addresses

Queries

As queries, it is necessary to specify links to pages, for example:

https://a-parser.com/pages/support/

Output results examples

A-Parser supports flexible result formatting thanks to the built-in Template Toolkit, which allows it to output results in any form, as well as in structured formats, such as CSV or JSON

Displaying the number of email addresses

Result format:

$mailcount

Example of result:

Possible settings

note

General settings for all scrapers
Common settings for all scrapers working over HTTP protocol

Parameter Name	Default Value	Description
Good status	`All`	Selection of which server response will be considered successful. If another response is received during scraping, the request will be repeated with a different proxy
Good code RegEx		Ability to specify a regular expression to check the response code
Ban Proxy Code RegEx		Ability to temporarily ban a proxy for a time (Proxy ban time) based on the server response code
Method	`GET`	Request method
POST body		Content to be sent to the server when using the POST method. Supports variables `$query` – URL request, `$query.orig` – original request, and `$pagenum` - page number when using the Use Pages option.
Cookies		Ability to specify cookies for the request.
User agent	`_Automatically substituted user-agent of the current Chrome version_`	User-Agent header when requesting pages
Additional headers		Ability to specify custom request headers with support for templating features and using variables from the request builder
Read only headers	`☐`	Read headers only. In some cases, it saves traffic if there is no need to process content
Detect charset on content	`☐`	Detect charset based on the content of the page
Emulate browser headers	`☐`	Emulate browser headers
Max redirects count	`0`	Maximum number of redirects the scraper will follow
Follow common redirects	`☑`	Allows for http <-> https and www.domain <-> domain redirects within the same domain, bypassing the Max redirects count limit
Max cookies count	`16`	Maximum number of cookies to save
Engine	`HTTP (Fast, JavaScript Disabled)`	Allows choosing between the HTTP engine (faster, without JavaScript) or Chrome (slower, JavaScript enabled)
Chrome Headless	`☐`	If this option is enabled, the browser will not be displayed
Chrome DevTools	`☑`	Allows the use of Chromium debugging tools
Chrome Log Proxy connections	`☑`	If this option is enabled, information about Chrome connections will be logged
Chrome Wait Until	`networkidle2`	Determines when the page is considered loaded. More about the values.
Use HTTP/2 transport	`☐`	Determines whether to use HTTP/2 instead of HTTP/1.1. For example, Google and Majestic immediately ban if HTTP/1.1 is used.
Don't verify TLS certs	`☐`	Disable TLS certificate validation
Randomize TLS Fingerprint	`☐`	This option allows bypassing site bans by TLS fingerprint
Bypass CloudFlare	`☑`	Automatic bypass of CloudFlare checks
Bypass CloudFlare with Chrome(Experimental)	`☐`	Bypass CF through Chrome
Bypass CloudFlare with Chrome Max Pages	`20`	Max. number of pages when bypassing CF through Chrome
Subdomains are internal	`☐`	Whether to consider subdomains as internal links
Follow links	`Internal only`	Which links to follow
Follow links limit	`0`	Follow links limit, applied to each unique domain
Skip comment blocks	`☐`	Whether to skip comment blocks
Search Cloudflare protected e-mails	`☑`	Whether to scrape Cloudflare protected e-mails.
Skip non-HTML blocks	`☑`	Do not collect email addresses in tags (script, style, comment, etc.).
Skip meta tags	`☐`	Do not collect email addresses in meta tags.

Overview of the scraper​

Use cases for the scraper​

Scraping emails from a website with page navigation deep into a specified limit​

Scraping emails from a database of websites with page navigation deep into a specified limit​

Scraping emails from a database of links​

Collected data​

Capabilities​

Use cases​

Queries​

Output results examples​

Displaying the number of email addresses​

Possible settings​

Overview of the scraper

Use cases for the scraper

Scraping emails from a website with page navigation deep into a specified limit

Scraping emails from a database of websites with page navigation deep into a specified limit

Scraping emails from a database of links

Collected data

Capabilities

Use cases

Queries

Output results examples

Displaying the number of email addresses

Possible settings