Skip to main content

Check::BackLink - checks for the presence of a link (links) in the link database

Check::RosKomNadzor

The parser allows you to check backlinks, namely links to pages of sites that refer to your site.

A-Parser functionality allows you to save parsing settings for further use (presets), set a parsing schedule, and much more.

Saving results is possible in the format and structure that you need, thanks to the built-in powerful Template Toolkit template engine that allows you to apply additional logic to results and output data in various formats, including JSON, SQL, and CSV.

Monitoring backlinks

Periodic checking of backlinks with appending the results to the SQLite database table

List of collected data

  • The sum of external and internal links on the page
  • Checks for the presence of a link on the specified page 0 and 1
    • 0 - means that there is no exact match for the backlink
    • 1 - means that there is an exact match for the backlink
  • Blocking the specified page from viewing through robots.txt 0 and 1
  • Blocking page indexing through the robots meta tag with the noindex attribute, as well as blocking link traversal through the nofollow attribute
  • Blocking link traversal through the rel=nofollow attribute

Data collected by the SE::Google::TrustCheck parser

Additional data that can be obtained:

  • The number of external and internal links on the page
  • A list of all external and internal links on the page

Capabilities

  • Checks for the presence of a link on the specified page, with the ability to search for a link without specifying a scheme by string matching
  • Checks whether the page is closed for indexing through robots.txt
  • Checks the robots meta tag for the presence of noindex and nofollow attributes
  • Checks for the presence of rel=nofollow on the found link
  • Search for a link by string matching
  • Ability to specify your own User-Agent header

Use cases

  • Checking the placement of your links on specified pages
  • Search for links displayed only to a certain User-Agent (for example, for the Google bot)

Query examples

As queries, you need to specify the page on which to search for the link and specify the desired link separated by a space:

https://fishki.net/ https://lenta.ru/news/2020/12/18/lavina/
https://en.wikipedia.org/wiki/Moscow https://lenta.ru/news/2005/12/23/city/
http://soccerjerseys.in.net/ https://lenta.ru/news/2012/03/12/homeless/
https://tjournal.ru/ https://lenta.ru/articles/2016/02/15/deathlab/

Query substitutions

You can use built-in macros for automatic substitution of subqueries from files, for example, we want to check sites/site by the page base, specify a list of pages to search for links on:

https://fishki.net/
https://en.wikipedia.org/wiki/Moscow
http://soccerjerseys.in.net/
https://tjournal.ru/

In the query format, we specify a macro for substituting additional queries from the backlinks.txt file, this method allows you to check the site base for the presence of a list of links from the file:

$query {subs:backlinks}

This macro will create as many additional queries as there are in the file for each source search query, which in total will give [number of source queries (page links)] x [number of queries in the backlinks file] = [total number of queries] as a result of the macro operation.

You can also specify the protocol in the query format so that only domains can be used as queries:

http://$query 

This format will add http:// to each query.

Result output options

A-Parser supports flexible result formatting thanks to the built-in Template Toolkit template engine, which allows it to output results in any form, as well as in a structured form, such as CSV or JSON.

Result format (by default):

$backlink - $checklink: $exists, blocked by robots.txt: $robots\n

By default, the result will display the backlink, a link to the page where the backlink search is performed, the presence or absence of the backlink, and checking the page for blocking in the robots.txt file.

Result example:

http://soccerjerseys.in.net/ - https://lenta.ru/news/2012/03/12/homeless/: 1, blocked by robots.txt: 0
https://tjournal.ru/ - https://lenta.ru/articles/2016/02/15/deathlab/: 0, blocked by robots.txt: 0
https://en.wikipedia.org/wiki/Moscow - https://lenta.ru/news/2005/12/23/city/: 0, blocked by robots.txt: 0
https://fishki.net/ - https://lenta.ru/news/2020/12/18/lavina/: 0, blocked by robots.txt: 0

The result of the $actualchecklink variable exists only if there is a backlink on the page, if there is no backlink, the result of this variable will be none.

$actualbacklink and $actualchecklink are the actual links after the redirect.

Встроенная утилита tools.CSVLine позволяет создавать корректные табличные документы, готовые для импорта в Excel или Google Таблицы

Формат результата:

[%- tools.CSVline(p1.backlink, p1.checklink, p1.anchor, p1.nofollow, p1.noindex, p1.redirect, p1.exists, p1.robots, p1.actualbacklink, p1.actualchecklink, p1.intcount, p1.extcount) -%]

Имя файла:

$datefile.format().csv

Начальный текст:

Backlink,Checklink,Anchor,Nofollow,Noindex,Redirect,Exists,Robots,Actualbacklink,Actualchecklink,Intlinks count,Extlinks count

Пример результата:

https://tjournal.ru/,https://lenta.ru/articles/2016/02/15/deathlab/,none,0,0,0,0,0,https://tjournal.ru/,none,112,37
https://fishki.net/,https://lenta.ru/news/2020/12/18/lavina/,none,0,0,0,0,0,https://fishki.net/,none,966,31
http://soccerjerseys.in.net/,https://lenta.ru/news/2012/03/12/homeless/,"get more information",0,0,0,1,0,http://soccerjerseys.in.net/,https://lenta.ru/news/2012/03/12/homeless/,89,20
https://en.wikipedia.org/wiki/Moscow,https://lenta.ru/news/2005/12/23/city/,none,0,0,0,0,0,https://en.wikipedia.org/wiki/Moscow,none,2733,598
...
Скачать пример

Как импортировать пример в А-Парсер

eJx9VE1v4jAQ/SuR1UqtRGOg6mqVG6AidUWhS9u9UA5uMgE3jp21HaBC/Pcd5xPK
7t484zdvxjNvvCeWmcQ8aTBgDQkWe5IVZxKQ+x1LMwFeuIYw8d5ZmAguE+OxKPIy
plkKFrQhHYKGcadgsSAjBw6CIaIniMbbCGKWC0uWyw5BajyasdIpcykWlzeeVUoY
f/T8C9nhKuv5daaOh0aRvLGYDNdKF0epYiWE2lYGlxHsirOGiGsIbWHAjhtrSr96
V9WRhTZn4iRP6TrNxqUNVS5rptK49m4ul6R5yjPbwIvCp8RcQOseozXFDuHFRcQs
uFs/Lp59de3bnUUo9pFbriQTZT9cA9sevUr+O3fxUiEWj5qDGWuVostCQeCcn3Uv
F+SisAlS5EXszzKGBDETBjrEYKljhoVEX284DpJZpWeZqwf9e6LkQIgJbEC0sIJ/
mHMR4bQHMQY9VIF/h8zOOA7N845TbUBvNdbQsBTWcPbYRkVqolZ1MwRPuUXbjNxA
0NtFZwKQNT2bOliqNDRprM6hSY5yz0BGCBzWGhg1kx+UGpvW+ppW2prXurovNTUv
9TQ41dLgi44epC23plTS/e7YfJOtYgZZVRL50sUTVZw6QyVjvpph/zSPoEbm8gV3
eiZHyq2va6vMhUBVGJi36hyYSgXOaDp/FjwqUmBZ9Rp3SLGwP57LUjPNUf13rsAU
B3mctaIMmRCv88nxDWkVjcba2swElMbcrBPuS7DUq30CpGW+zqmEraH9br9Le33a
+04F23DJ6JuskSD9LU94hlNivtIr6iz6qEyotv+k6945uv4tDbn9rMgQZlQYgv7A
PsOnwW/g/zUhQ/fW8axVCgKMOarKfqhc44Y7+DkB05aHGOFIvtEuPuyORsDsWrB3
SlzrLKwU7jQO9rBsPtrmt96ffbfB/oDb8mGeSqSbrcOhD0VicBVI0Dv8AQ3PGZI=

tip

В Формате результатов применяется шаблонизатор Template Toolkit.

Что такое формат результатов.

В имени файла результатов нужно просто изменить разрешение файла на csv.

Чтобы опция "Начальный текст" была доступна в Редакторе заданий, нужно активировать "Больше опций". В "Начальный текст" записываем названия столбцов через запятую и второй строку делаем пустой.

Вывод внешних ссылок со страницы бэклинка в формате JSON:

Формат результата:

[% data = {}; data.query = query; data.links = []; FOREACH item IN p1.extlinks; data.links.push(item.link); END; IF !firstString; ",\n"; ELSE; firstString = 0; END; data.json %]

Начальный текст:

[% firstString = 1 %][

Конечный текст:

]

Пример результата:

[{"query":"https://tjournal.ru/ https://lenta.ru/articles/2016/02/15/deathlab/","links":["https://vc.ru/job","https://vc.ru/job/new","https://vc.ru/job","https://twitter.com/aktroitsky","https://twitter.com/aktroitsky/statuses/1382294384931188748","https://twitter.com/aktroitsky/statuses/1382294384931188748","https://t.co/fD4AiCpbrV","https://twitter.com/aktroitsky/statuses/1382294384931188748"]}]

Обработка результатов

A-Parser позволяет обрабатывать результаты непосредственно во время парсинга, в этом разделе мы привели наиболее популярные кейсы для парсера Check::BackLink

Сохранение доменов внешних ссылок при наличии бэклинков

Добавить фильтр и в выпадающем списке выбрать переменную траста $exists - Link exists. Выбрать тип: Строка равна. Далее нужно в поле "Строка" прописать значение, которое равно наличию бэклинка 1. Таким фильтром вы сможете вывести все результаты с наличием бэклинка.

Добавить Конструктор результатов и в выпадающем списке выбрать источник: $p1.extlinks.$i.link - Link. Выбрать тип: Extract Top Domain. Так получаем домены из внешних ссылок.

пример использования фильтра и Конструктора результатов в парсере Check::BackLink

Скачать пример

Как импортировать пример в А-Парсер

eJx9VNtuGjEQ/RVkIaWR6C4Qpar2jdAgpSIhJeSJ5MHZHcDBa29sLxch/r0z3hsp
bd88M2fO3H1gjtu1fTRgwVkWzQ8s828WsdsdTzMJrXgF8br1xuO1FGptW7AT1tnW
6G48u52yDsu4sWDIec6GhI2iGwSPEYzWBBY8l451DsztM0DehZAODJowEFkiVjCi
pjDNChx85FyicsNlTnIP3zpzQisULCjLjg2p3oAxIgHEiISCaJNyV0ZoONqwc76K
oAB8uWhXhbW+ttq+1Eoo68QXaV5e1MUlO76+VnnbkWcg0qwXlF2rjU98AzNdVAuN
eoTSA099Kgl3QNYqlcvA7YiBJ4mgKrksIlBnm6jPSnz4UpRGLD6NADsyOkWVA09A
yn2V3Zy1vcyQIve+vwofFi24tNBhFlMdcUwk+dMicBjcaTPxXUf9gWk1kHIMG5AN
zPPf5EImuAaDBTrdlY5/h0zOOI51eaehcKRbgznULF66mdw3Xoke62XVDClS4VC2
Q50rGkwXlWuArO7ZA8FSbaAO40wOdXA8gwwUrU8zsUHWqD5V8WkqJ8oDszo3MYab
dztzVi2czw8vghao3Fk0GR67mc5+6JQLRbM3hu8LU+XlaIu86xFdY60WYjkpt71K
IlczPOOJGmq6WOqYyqXEgVuYNos3sOWASaibeuY89CEwaH26mIOW9udT0YXMCEzp
mmpPcUanUUvKmEv5PB2fWlizrCisnMtsFIYLYVdrEShwYavSSVCOByYPFWxt2O/2
u2GvH/a+h5JvhOLhi6qQoIKtWIsMEsEDbZYhSeG9trHe/pOue010/aswFm5fkiHM
6jgG844jhL0NhPp/TsjQvSKelU5BgrUnWbl3HD8eL8HPCbhxIkYPIvkWdrGw6zAB
7laSv4WMWudgqfFccbA07/JzrT/ow9kXGx2OeAjv9rFA0mwJhzpcEut/y97xN4Qy
DUs=

tip

You can add the results constructor as many times as you need.

See also: Results builder

See also: Results filters

Possible settings

Supports all settings of the HTML::LinkExtractorHTML::LinkExtractor parser, as well as additionally:

Parameter nameDefault valueDescription
Check robots.txtDetermines whether to check for indexing prohibition via robots.txt
Match link by substringDetermines whether to search for a link by substring. You can check links without specifying the scheme, for example, by domain without specifying the http protocol