Skip to main content

Rank::Ahrefs::BrokenLinks - Ahrefs Broken Link Checker Scraper

img

Parser overview

Rank::Ahrefs::BrokenLinksRank::Ahrefs::BrokenLinks – this scraper retrieves the count and list (top 10) of broken inbound and outbound links.

This scraper retrieves data from the page https://ahrefs.com/broken-link-checker.

Requires connection to a recaptcha solving service Util::TurnstileUtil::Turnstile.

Collected data

  • Number of broken inbound links and the percentage of dofollow among them
  • Number of broken outbound links and the percentage of dofollow among them
  • Top 10 broken inbound links and their characteristics
  • Top 10 broken outbound links and their characteristics

What data the Rank::Ahrefs::BrokenLinks scraper collects

Capabilities

  • Automatic work with proxies
  • Choice of query type

Use cases

  • Retrieving data about broken links

Queries

Domains should be used as queries, for example:

yep.com
a-parser.com

Output results examples

A-Parser supports flexible result formatting thanks to the built-in templating engine Template Toolkit, which allows it to output results in an arbitrary form, as well as in a structured format, such as CSV or JSON

Outputting the number of broken inbound and outbound links

Result format:

$query: inbound - $in, outbound - $out\n

Result example:

a-parser.com: inbound - 646, outbound - 1300
yep.com: inbound - 236, outbound - 0

Outputting a list of broken inbound links with some additional parameters to a CSV table

The built-in utility $tools.CSVLine allows for creating correct tabular documents ready for import into Excel or Google Sheets.

Result format:

[% FOREACH item IN p1.inbound;
tools.CSVline(item.rank, item.rating, item.traffic, item.from, item.to);
END %]

File name:

$datefile.format().csv

Initial text:

Rank,Rating,Traffic,From,To

Result example:

Rank,Rating,Traffic,From,To
50,93,28333.153498,https://blog.hubspot.com/marketing/top-search-engines,https://yep.com/about
23,6,0,http://lagrilladeariegeoise.com/spip.php?article5,http://user1481732362576.yep.com/blog/405236_General/1859660_4_Tips_for_overwatch_boosting
20,76,2.862819,https://www.abondance.com/20220607-47814-ahrefs-sort-yep-son-moteur-de-recherche-concurrent-de-google-et-bing.html,https://yep.com/settings
15,33,0,http://www.annieshomepage.com/shalloweenlinks.html,http://www.yep.com/cgi-bin/displayRank_yep.cgi?Religion/ranking/25
14,33,0,http://www.annieshomepage.com/halloween2.html,http://www.yep.com/cgi-bin/displayRank_yep.cgi?Religion/ranking/25
14,33,0,http://www.annieshomepage.com/halloweenlinks.html,http://www.yep.com/cgi-bin/displayRank_yep.cgi?Religion/ranking/25
13,92,0,https://sourceforge.net/p/jmdns/bugs/110/,http://northfacecoat.yep.com/
13,11,0.088871,http://alain-pire.be/WordPress/?p=27,http://gamesgratis.yep.com/blog
13,11,0.088871,http://alain-pire.be/WordPress/?p=27,http://hoteljobs.yep.com/blog/69066/104644
12,34,-1,https://earlyinvesting.com/search-engine-market-is-waiting-be-disrupted/,https://yep.com/about
14,32,0,https://s2.openssource.cc/threads/a-parser-universalnyj-mnogopotochnyj-parser-parsing-ljubyx-dannyx.136378/,https://a-parser.com/wiki/rank-semrush/
11,52,0,https://www.gofuckbiz.com/showthread.php?t=30454,http://a-parser.com/projects/a-parser/wiki
11,32,0,https://s2.openssource.cc/threads/a-parser-1-1-prodvinutyj-parser-poiskovyx-sistem-suggest-pr-dmoz-whois-etc.19351/page-5,https://a-parser.com/wiki/rank-linkpad/
11,32,0,https://s2.openssource.cc/threads/a-parser-1-1-prodvinutyj-parser-poiskovyx-sistem-suggest-pr-dmoz-whois-etc.19351/page-5,https://a-parser.com/wiki/rank-semrush/
11,32,0,https://s2.openssource.cc/threads/a-parser-1-1-prodvinutyj-parser-poiskovyx-sistem-suggest-pr-dmoz-whois-etc.19351/page-5,https://a-parser.com/wiki/se-bing-langdetect/
11,32,0,https://s2.openssource.cc/threads/a-parser-1-1-prodvinutyj-parser-poiskovyx-sistem-suggest-pr-dmoz-whois-etc.19351/page-6,https://a-parser.com/docs/javascript-parsers/class-methods-v2
11,32,0,https://s2.openssource.cc/threads/a-parser-1-1-prodvinutyj-parser-poiskovyx-sistem-suggest-pr-dmoz-whois-etc.19351/page-6,https://a-parser.com/docs/parsers/google-maps
11,32,0,https://s2.openssource.cc/threads/a-parser-1-1-prodvinutyj-parser-poiskovyx-sistem-suggest-pr-dmoz-whois-etc.19351/page-6,https://a-parser.com/docs/parsers/rank-linkpad
10,73,0,https://forum.bits.media/index.php?/profile/230848-_forbidden_/content/&type=forums_topic_post,https://a-parser.com/wiki/rank-semrush/
10,73,0,https://forum.bits.media/index.php?/topic/183422-a-parser-%D1%83%D0%BD%D0%B8%D0%B2%D0%B5%D1%80%D1%81%D0%B0%D0%BB%D1%8C%D0%BD%D1%8B%D0%B9-%D0%BC%D0%BD%D0%BE%D0%B3%D0%BE%D0%BF%D0%BE%D1%82%D0%BE%D1%87%D0%BD%D1%8B%D0%B9-%D0%BF%D0%B0%D1%80%D1%81%D0%B5%D1%80/,https://a-parser.com/wiki/rank-semrush/

Download example

How to import the example into A-Parser

eJx9VFtvmzAU/ivIaqVWYmidtBf2RFjROmWhI8lekqjy4JB6MbZnm2wRyn/vMRBI
2mlv/s71Ozc3xFKzM48aDFhDwlVDVPsmISmgpDW3xCeKagPaqVcko2IXhtGzhtKE
4UTLHYgpEzuDdoNHQ+xBAcaQe9CaFYBKViBeWsafMoipsvkz/fDUJ/PJnvIazpIe
NxufoBLfJpG6oo7R6tpL0uw+ir94zELlPcw8dRcw8VPWovi0Fp7nWSm5CeL5D84E
3DirQCNj3+ufloltD6ymZcnyHpVaVieFvMVg97PP3vWGDCzmdA8LiSxKxmEUJ4hm
tHLcrwpqwWmDsmV8cxvkZo+mtCiYZVJQ3pXiOjmWtxTsd1u7kGiLT83AJEgHRRb+
2l54OLVhRa5aTDBE3fp+73xIWFJuwCcGqSYUiRSvNVgf9kDqVDk+KG+IFBHnU9gD
H83a+JOa8QLHHpXo9NA7/tskfRPjOJR3ngq34Y9GDkOUFk3Sb6NXIadye2oGZxWz
iE2ME3Yb8B6FOwA19GzmzCqpYUhjdQ1DclxsBcJtnttbP+vmv+gn75rsL+RajOOM
VG9PXpV4MbJLYS5FybZpv+ony1os8LRSEctKcXA1i5pzHJmBbFydyPQjcmBoyxvn
uE1xcZTtpn+dd1SVZriaHx3BCrt8nrUPmVPOl9n0XEPGdUNwABXksloL+q67d4eI
c7SwlbhuWNZxM3wGw5fR/OdLCJsjjvSXeex8XI2tEO8dIc6LhHfHF1MeluY=
tip

The Template Toolkit templating engine is used in the Result Format to output the $inbound array in a FOREACH loop.

In the results filename, you just need to change the file extension to csv.

For the "Initial text" option to be available in the Task Editor, you need to activate "More options". In "Initial text", write the column names separated by commas and make the second line blank.

Possible Settings

ParameterDefault valueDescription
Util::Turnstile presetdefaultSelecting a preset Util::TurnstileUtil::Turnstile to bypass captchas. You must first configure the Util::TurnstileUtil::Turnstile scraper - specify your access key and other parameters, then select the created preset here.
Turnstile pass proxyPassing a proxy to the solving service. More details.
Mode*.domain/*Choice of query type
Do not search for sitekeyExperimental option; disables the search for the sitekey captcha, which in turn speeds up job start. In case of problems like an invalid sitekey, this option should be disabled.
Additional headersOption to specify arbitrary request headers