Skip to main content

Check::BackLink - checks for the presence of a link (links) in the link database

Overview of the scraper

Overview: speed of operation

The scraper allows you to check backlinks, namely links to website pages that link to your site.

A-Parser functionality allows you to save parsing settings for future use (presets), set a parsing schedule, and much more.

Saving results is possible in the form and structure you need, thanks to the built-in powerful template engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.

Use cases for the scraper

Collected data

  • The sum of external and internal links on the page
  • Checks for the presence of a link on the specified page 0 and 1
    • 0 - means there is no exact match of the backlink
    • 1 - means there is an exact match of the backlink
  • Blocking the specified page from viewing through robots.txt - 0 and 1
  • Blocking page indexing through meta tag robots with the attribute noindex, as well as blocking link following through the attribute nofollow
  • Blocking link following through the attribute rel=nofollow

Additional data that can be obtained:

  • The number of external and internal links on the page
  • A list of all external and internal links on the page

Capabilities

  • Checks for the presence of a link on the specified page, with the ability to search for a link without specifying a scheme by string inclusion
  • Checks whether the page is closed from indexing through robots.txt
  • Checks the robots meta tag for the presence of noindex and nofollow attributes
  • Checks for the presence of rel=nofollow on the found link
  • Search for a link by string inclusion
  • Ability to specify your own User-Agent header

Use cases

  • Checking the placement of your links on specified pages
  • Searching for links that are displayed only to a certain User-Agent (for example, for the Google bot)

Queries

As queries, you need to specify the page on which to search for the link and indicate the desired link through a space:

https://fishki.net/ https://lenta.ru/news/2020/12/18/lavina/
https://en.wikipedia.org/wiki/Moscow https://lenta.ru/news/2005/12/23/city/
http://soccerjerseys.in.net/ https://lenta.ru/news/2012/03/12/homeless/
https://tjournal.ru/ https://lenta.ru/articles/2016/02/15/deathlab/

Query substitutions

You can use built-in macros for automatic substitution of subqueries from files, for example, we want to check sites/site by a database of pages, we will specify a list of pages on which to search for links:

https://fishki.net/
https://en.wikipedia.org/wiki/Moscow
http://soccerjerseys.in.net/
https://tjournal.ru/

In the query format, we will specify a macro for substituting additional queries from the file backlinks.txt, this method allows checking a database of sites for the presence of a list of links from a file:

$query {subs:backlinks}

This macro will create as many additional queries as there are in the file for each original search query, which in total will give [number of original queries (links to pages)] x [number of queries in the file backlinks] = [total number of queries] as a result of the macro's work.

You can also specify in the query format a protocol so that you can use only domains as queries:

http://$query 

This format will prepend http:// to each query.

Output Options Examples

A-Parser supports flexible formatting of results thanks to the built-in template engine Template Toolkit, which allows it to output results in any form, as well as in structured formats, such as CSV or JSON

Default Output

Result format:

$backlink - $checklink: $exists, blocked by robots.txt: $robots\n

An example of a result showing a backlink, a link to the page where the backlink search is taking place, the presence or absence of a backlink, and checking the page for blocking in the robots.txt file:

http://soccerjerseys.in.net/ - https://lenta.ru/news/2012/03/12/homeless/: 1, blocked by robots.txt: 0
https://tjournal.ru/ - https://lenta.ru/articles/2016/02/15/deathlab/: 0, blocked by robots.txt: 0
https://en.wikipedia.org/wiki/Moscow - https://lenta.ru/news/2005/12/23/city/: 0, blocked by robots.txt: 0
https://fishki.net/ - https://lenta.ru/news/2020/12/18/lavina/: 0, blocked by robots.txt: 0

Output of Backlink Presence and Additional Parameters for Analyzing Backlinks and Pages with Backlinks in a CSV Table

The built-in utility $tools.CSVLine allows you to create correct table documents, ready for import into Excel or Google Sheets.

The result of the variable $actualchecklink is present only if there is a backlink on the page, if there is no backlink, then the result of this variable will be none. $actualbacklink and $actualchecklink are the real links after the redirect.

Result format:

[% tools.CSVline(backlink, checklink, anchor, nofollow, noindex, redirect, exists, robots, actualbacklink, actualchecklink, intcount, extcount) %]

File name:

$datefile.format().csv

Initial text:

Backlink,Checklink,Anchor,Nofollow,Noindex,Redirect,Exists,Robots,Actualbacklink,Actualchecklink,Intlinks count,Extlinks count

Example of the result:

https://tjournal.ru/,https://lenta.ru/articles/2016/02/15/deathlab/,none,0,0,0,0,0,https://tjournal.ru/,none,112,37
https://fishki.net/,https://lenta.ru/news/2020/12/18/lavina/,none,0,0,0,0,0,https://fishki.net/,none,966,31
http://soccerjerseys.in.net/,https://lenta.ru/news/2012/03/12/homeless/,"get more information",0,0,0,1,0,http://soccerjerseys.in.net/,https://lenta.ru/news/2012/03/12/homeless/,89,20
https://en.wikipedia.org/wiki/Moscow,https://lenta.ru/news/2005/12/23/city/,none,0,0,0,0,0,https://en.wikipedia.org/wiki/Moscow,none,2733,598
...
Download example

How to import an example into A-Parser

eJx9VE1v4jAQ/SuR1UqtRGOg6mqVG6AidUWhS9u9UA5uMgE3jp21HaBC/Pcd5xPK
7t484zdvxjNvvCeWmcQ8aTBgDQkWe5IVZxKQ+x1LMwFeuIYw8d5ZmAguE+OxKPIy
plkKFrQhHYKGcadgsSAjBw6CIaIniMbbCGKWC0uWyw5BajyasdIpcykWlzeeVUoY
f/T8C9nhKuv5daaOh0aRvLGYDNdKF0epYiWE2lYGlxHsirOGiGsIbWHAjhtrSr96
V9WRhTZn4iRP6TrNxqUNVS5rptK49m4ul6R5yjPbwIvCp8RcQOseozXFDuHFRcQs
uFs/Lp59de3bnUUo9pFbriQTZT9cA9sevUr+O3fxUiEWj5qDGWuVostCQeCcn3Uv
F+SisAlS5EXszzKGBDETBjrEYKljhoVEX284DpJZpWeZqwf9e6LkQIgJbEC0sIJ/
mHMR4bQHMQY9VIF/h8zOOA7N845TbUBvNdbQsBTWcPbYRkVqolZ1MwRPuUXbjNxA
0NtFZwKQNT2bOliqNDRprM6hSY5yz0BGCBzWGhg1kx+UGpvW+ppW2prXurovNTUv
9TQ41dLgi44epC23plTS/e7YfJOtYgZZVRL50sUTVZw6QyVjvpph/zSPoEbm8gV3
eiZHyq2va6vMhUBVGJi36hyYSgXOaDp/FjwqUmBZ9Rp3SLGwP57LUjPNUf13rsAU
B3mctaIMmRCv88nxDWkVjcba2swElMbcrBPuS7DUq30CpGW+zqmEraH9br9Le33a
+04F23DJ6JuskSD9LU94hlNivtIr6iz6qEyotv+k6945uv4tDbn9rMgQZlQYgv7A
PsOnwW/g/zUhQ/fW8axVCgKMOarKfqhc44Y7+DkB05aHGOFIvtEuPuyORsDsWrB3
SlzrLKwU7jQO9rBsPtrmt96ffbfB/oDb8mGeSqSbrcOhD0VicBVI0Dv8AQ3PGZI=

tip

The Template Toolkit is used in the Result Format.

What is the result format.

In the file name of the results, you just need to change the file extension to csv.

To make the "Initial text" option available in the Task Editor, you need to activate "More options". In "Initial text" we write the names of the columns separated by commas and make the second line empty.

Dumping External Links from the Backlink Page into JSON

Result format:

[% data = {}; 
data.query = query; data.links = [];
FOREACH item IN extlinks;
data.links.push(item.link);
END;
IF !firstString;
",\n";
ELSE;
firstString = 0;
END;
data.json %]

Initial text:

[% firstString = 1 %][

Final text:

]

Example of the result:

[{"query":"https://tjournal.ru/ https://lenta.ru/articles/2016/02/15/deathlab/","links":["https://vc.ru/job","https://vc.ru/job/new","https://vc.ru/job","https://twitter.com/aktroitsky","https://twitter.com/aktroitsky/statuses/1382294384931188748","https://twitter.com/aktroitsky/statuses/1382294384931188748","https://t.co/fD4AiCpbrV","https://twitter.com/aktroitsky/statuses/1382294384931188748"]}]

Results Processing

A-Parser allows you to process results directly during scraping, in this section we have provided the most popular cases for the scraper Check::BackLink

Add a filter and select the trust variable from the dropdown list $exists - Link exists. Choose type: String equals. Then you need to write the value in the String field, which equals the presence of a backlink 1. With such a filter, you will be able to output all results with the presence of a backlink.

Add Result Constructor and select the source from the dropdown list: $p1.extlinks.$i.link - Link. Choose type: Extract Top Domain. This way we get domains from external links.

Example of using a filter and a Result Constructor
Download example

How to import an example into A-Parser

eJx9VNtuGjEQ/RVkIaWR6C4Qpar2jdAgpSIhJeSJ5MHZHcDBa29sLxch/r0z3hsp
bd88M2fO3H1gjtu1fTRgwVkWzQ8s828WsdsdTzMJrXgF8br1xuO1FGptW7AT1tnW
6G48u52yDsu4sWDIec6GhI2iGwSPEYzWBBY8l451DsztM0DehZAODJowEFkiVjCi
pjDNChx85FyicsNlTnIP3zpzQisULCjLjg2p3oAxIgHEiISCaJNyV0ZoONqwc76K
oAB8uWhXhbW+ttq+1Eoo68QXaV5e1MUlO76+VnnbkWcg0qwXlF2rjU98AzNdVAuN
eoTSA099Kgl3QNYqlcvA7YiBJ4mgKrksIlBnm6jPSnz4UpRGLD6NADsyOkWVA09A
yn2V3Zy1vcyQIve+vwofFi24tNBhFlMdcUwk+dMicBjcaTPxXUf9gWk1kHIMG5AN
zPPf5EImuAaDBTrdlY5/h0zOOI51eaehcKRbgznULF66mdw3Xoke62XVDClS4VC2
Q50rGkwXlWuArO7ZA8FSbaAO40wOdXA8gwwUrU8zsUHWqD5V8WkqJ8oDszo3MYab
dztzVi2czw8vghao3Fk0GR67mc5+6JQLRbM3hu8LU+XlaIu86xFdY60WYjkpt71K
IlczPOOJGmq6WOqYyqXEgVuYNos3sOWASaibeuY89CEwaH26mIOW9udT0YXMCEzp
mmpPcUanUUvKmEv5PB2fWlizrCisnMtsFIYLYVdrEShwYavSSVCOByYPFWxt2O/2
u2GvH/a+h5JvhOLhi6qQoIKtWIsMEsEDbZYhSeG9trHe/pOue010/aswFm5fkiHM
6jgG844jhL0NhPp/TsjQvSKelU5BgrUnWbl3HD8eL8HPCbhxIkYPIvkWdrGw6zAB
7laSv4WMWudgqfFccbA07/JzrT/ow9kXGx2OeAjv9rFA0mwJhzpcEut/y97xN4Qy
DUs=
tip

You can add the Result Constructor as many times as you need.

See also:

Possible Settings

Supports all settings of the scraper HTML::LinkExtractorHTML::LinkExtractor, as well as additionally:

Parameter NameDefault ValueDescription
Check robots.txtDetermines whether to check for page indexing prohibition through robots.txt
Match link by substringDetermines whether to search for a link by substring inclusion. It is possible to check links without specifying a scheme, for example, by domain without specifying the http protocol