Check::BackLink - checks for the presence of a link (links) in the link database
Check::BackLink parser overview
The parser allows you to check backlinks, namely links to pages of sites that refer to your site.
A-Parser functionality allows you to save parsing settings for further use (presets), set a parsing schedule, and much more.
Saving results is possible in the format and structure that you need, thanks to the built-in powerful Template Toolkit template engine that allows you to apply additional logic to results and output data in various formats, including JSON, SQL, and CSV.
Use cases for Check::BackLink parser
Monitoring backlinks
Periodic checking of backlinks with appending the results to the SQLite database table
List of collected data
- The sum of external and internal links on the page
- Checks for the presence of a link on the specified page
0
and1
0
- means that there is no exact match for the backlink1
- means that there is an exact match for the backlink
- Blocking the specified page from viewing through robots.txt
0
and1
- Blocking page indexing through the robots meta tag with the noindex attribute, as well as blocking link traversal through the nofollow attribute
- Blocking link traversal through the
rel=nofollow
attribute
Additional data that can be obtained:
- The number of external and internal links on the page
- A list of all external and internal links on the page
Capabilities
- Checks for the presence of a link on the specified page, with the ability to search for a link without specifying a scheme by string matching
- Checks whether the page is closed for indexing through
robots.txt
- Checks the robots meta tag for the presence of noindex and nofollow attributes
- Checks for the presence of
rel=nofollow
on the found link - Search for a link by string matching
- Ability to specify your own
User-Agent
header
Use cases
- Checking the placement of your links on specified pages
- Search for links displayed only to a certain User-Agent (for example, for the Google bot)
Query examples
As queries, you need to specify the page on which to search for the link and specify the desired link separated by a space:
https://fishki.net/ https://lenta.ru/news/2020/12/18/lavina/
https://en.wikipedia.org/wiki/Moscow https://lenta.ru/news/2005/12/23/city/
http://soccerjerseys.in.net/ https://lenta.ru/news/2012/03/12/homeless/
https://tjournal.ru/ https://lenta.ru/articles/2016/02/15/deathlab/
Query substitutions
You can use built-in macros for automatic substitution of subqueries from files, for example, we want to check sites/site by the page base, specify a list of pages to search for links on:
https://fishki.net/
https://en.wikipedia.org/wiki/Moscow
http://soccerjerseys.in.net/
https://tjournal.ru/
In the query format, we specify a macro for substituting additional queries from the backlinks.txt file, this method allows you to check the site base for the presence of a list of links from the file:
$query {subs:backlinks}
This macro will create as many additional queries as there are in the file for each source search query, which in total will give [number of source queries (page links)] x [number of queries in the backlinks file] = [total number of queries]
as a result of the macro operation.
You can also specify the protocol in the query format so that only domains can be used as queries:
http://$query
This format will add http://
to each query.
Result output options
A-Parser supports flexible result formatting thanks to the built-in Template Toolkit template engine, which allows it to output results in any form, as well as in a structured form, such as CSV or JSON.
Output in the results of the presence of backlinks, on which page the backlink is located, and checking this page for blocking in the robots.txt file
Result format (by default):
$backlink - $checklink: $exists, blocked by robots.txt: $robots\n
By default, the result will display the backlink, a link to the page where the backlink search is performed, the presence or absence of the backlink, and checking the page for blocking in the robots.txt file.
Result example:
http://soccerjerseys.in.net/ - https://lenta.ru/news/2012/03/12/homeless/: 1, blocked by robots.txt: 0
https://tjournal.ru/ - https://lenta.ru/articles/2016/02/15/deathlab/: 0, blocked by robots.txt: 0
https://en.wikipedia.org/wiki/Moscow - https://lenta.ru/news/2005/12/23/city/: 0, blocked by robots.txt: 0
https://fishki.net/ - https://lenta.ru/news/2020/12/18/lavina/: 0, blocked by robots.txt: 0
Output in the results of the presence of backlinks, additional parameters for analyzing backlinks, and pages with backlinks in a CSV table
The result of the $actualchecklink
variable exists only if there is a backlink on the page, if there is no backlink, the result of this variable will be none
.
$actualbacklink
and $actualchecklink
are the actual links after the redirect.
Встроенная утилита tools.CSVLine позволяет создавать корректные табличные документы, готовые для импорта в Excel или Google Таблицы
Формат результата:
[%- tools.CSVline(p1.backlink, p1.checklink, p1.anchor, p1.nofollow, p1.noindex, p1.redirect, p1.exists, p1.robots, p1.actualbacklink, p1.actualchecklink, p1.intcount, p1.extcount) -%]
Имя файла:
$datefile.format().csv
Начальный текст:
Backlink,Checklink,Anchor,Nofollow,Noindex,Redirect,Exists,Robots,Actualbacklink,Actualchecklink,Intlinks count,Extlinks count
Пример результата:
https://tjournal.ru/,https://lenta.ru/articles/2016/02/15/deathlab/,none,0,0,0,0,0,https://tjournal.ru/,none,112,37
https://fishki.net/,https://lenta.ru/news/2020/12/18/lavina/,none,0,0,0,0,0,https://fishki.net/,none,966,31
http://soccerjerseys.in.net/,https://lenta.ru/news/2012/03/12/homeless/,"get more information",0,0,0,1,0,http://soccerjerseys.in.net/,https://lenta.ru/news/2012/03/12/homeless/,89,20
https://en.wikipedia.org/wiki/Moscow,https://lenta.ru/news/2005/12/23/city/,none,0,0,0,0,0,https://en.wikipedia.org/wiki/Moscow,none,2733,598
...
Скачать пример
Как импортировать пример в А-Парсер
eJx9VE1v4jAQ/SuR1UqtRGOg6mqVG6AidUWhS9u9UA5uMgE3jp21HaBC/Pcd5xPK
7t484zdvxjNvvCeWmcQ8aTBgDQkWe5IVZxKQ+x1LMwFeuIYw8d5ZmAguE+OxKPIy
plkKFrQhHYKGcadgsSAjBw6CIaIniMbbCGKWC0uWyw5BajyasdIpcykWlzeeVUoY
f/T8C9nhKuv5daaOh0aRvLGYDNdKF0epYiWE2lYGlxHsirOGiGsIbWHAjhtrSr96
V9WRhTZn4iRP6TrNxqUNVS5rptK49m4ul6R5yjPbwIvCp8RcQOseozXFDuHFRcQs
uFs/Lp59de3bnUUo9pFbriQTZT9cA9sevUr+O3fxUiEWj5qDGWuVostCQeCcn3Uv
F+SisAlS5EXszzKGBDETBjrEYKljhoVEX284DpJZpWeZqwf9e6LkQIgJbEC0sIJ/
mHMR4bQHMQY9VIF/h8zOOA7N845TbUBvNdbQsBTWcPbYRkVqolZ1MwRPuUXbjNxA
0NtFZwKQNT2bOliqNDRprM6hSY5yz0BGCBzWGhg1kx+UGpvW+ppW2prXurovNTUv
9TQ41dLgi44epC23plTS/e7YfJOtYgZZVRL50sUTVZw6QyVjvpph/zSPoEbm8gV3
eiZHyq2va6vMhUBVGJi36hyYSgXOaDp/FjwqUmBZ9Rp3SLGwP57LUjPNUf13rsAU
B3mctaIMmRCv88nxDWkVjcba2swElMbcrBPuS7DUq30CpGW+zqmEraH9br9Le33a
+04F23DJ6JuskSD9LU94hlNivtIr6iz6qEyotv+k6945uv4tDbn9rMgQZlQYgv7A
PsOnwW/g/zUhQ/fW8axVCgKMOarKfqhc44Y7+DkB05aHGOFIvtEuPuyORsDsWrB3
SlzrLKwU7jQO9rBsPtrmt96ffbfB/oDb8mGeSqSbrcOhD0VicBVI0Dv8AQ3PGZI=
В Формате результатов применяется шаблонизатор Template Toolkit.
В имени файла результатов нужно просто изменить разрешение файла на csv.
Чтобы опция "Начальный текст" была доступна в Редакторе заданий, нужно активировать "Больше опций". В "Начальный текст" записываем названия столбцов через запятую и второй строку делаем пустой.
Вывод внешних ссылок со страницы бэклинка в формате JSON:
Формат результата:
[% data = {}; data.query = query; data.links = []; FOREACH item IN p1.extlinks; data.links.push(item.link); END; IF !firstString; ",\n"; ELSE; firstString = 0; END; data.json %]
Начальный текст:
[% firstString = 1 %][
Конечный текст:
]
Пример результата:
[{"query":"https://tjournal.ru/ https://lenta.ru/articles/2016/02/15/deathlab/","links":["https://vc.ru/job","https://vc.ru/job/new","https://vc.ru/job","https://twitter.com/aktroitsky","https://twitter.com/aktroitsky/statuses/1382294384931188748","https://twitter.com/aktroitsky/statuses/1382294384931188748","https://t.co/fD4AiCpbrV","https://twitter.com/aktroitsky/statuses/1382294384931188748"]}]
Обработка результатов
A-Parser позволяет обрабатывать результаты непосредственно во время парсинга, в этом разделе мы привели наиболее популярные кейсы для парсера Check::BackLink
Сохранение доменов внешних ссылок при наличии бэклинков
Добавить фильтр и в выпадающем списке выбрать переменную траста $exists - Link exists
. Выбрать тип: Строка равна
. Далее нужно в поле "Строка" прописать значение, которое равно наличию бэклинка 1
. Таким фильтром вы сможете вывести все результаты с наличием бэклинка.
Добавить Конструктор результатов и в выпадающем списке выбрать источник: $p1.extlinks.$i.link - Link
. Выбрать тип: Extract Top Domain
. Так получаем домены из внешних ссылок.
Скачать пример
Как импортировать пример в А-Парсер
eJx9VNtuGjEQ/RVkIaWR6C4Qpar2jdAgpSIhJeSJ5MHZHcDBa29sLxch/r0z3hsp
bd88M2fO3H1gjtu1fTRgwVkWzQ8s828WsdsdTzMJrXgF8br1xuO1FGptW7AT1tnW
6G48u52yDsu4sWDIec6GhI2iGwSPEYzWBBY8l451DsztM0DehZAODJowEFkiVjCi
pjDNChx85FyicsNlTnIP3zpzQisULCjLjg2p3oAxIgHEiISCaJNyV0ZoONqwc76K
oAB8uWhXhbW+ttq+1Eoo68QXaV5e1MUlO76+VnnbkWcg0qwXlF2rjU98AzNdVAuN
eoTSA099Kgl3QNYqlcvA7YiBJ4mgKrksIlBnm6jPSnz4UpRGLD6NADsyOkWVA09A
yn2V3Zy1vcyQIve+vwofFi24tNBhFlMdcUwk+dMicBjcaTPxXUf9gWk1kHIMG5AN
zPPf5EImuAaDBTrdlY5/h0zOOI51eaehcKRbgznULF66mdw3Xoke62XVDClS4VC2
Q50rGkwXlWuArO7ZA8FSbaAO40wOdXA8gwwUrU8zsUHWqD5V8WkqJ8oDszo3MYab
dztzVi2czw8vghao3Fk0GR67mc5+6JQLRbM3hu8LU+XlaIu86xFdY60WYjkpt71K
IlczPOOJGmq6WOqYyqXEgVuYNos3sOWASaibeuY89CEwaH26mIOW9udT0YXMCEzp
mmpPcUanUUvKmEv5PB2fWlizrCisnMtsFIYLYVdrEShwYavSSVCOByYPFWxt2O/2
u2GvH/a+h5JvhOLhi6qQoIKtWIsMEsEDbZYhSeG9trHe/pOue010/aswFm5fkiHM
6jgG844jhL0NhPp/TsjQvSKelU5BgrUnWbl3HD8eL8HPCbhxIkYPIvkWdrGw6zAB
7laSv4WMWudgqfFccbA07/JzrT/ow9kXGx2OeAjv9rFA0mwJhzpcEut/y97xN4Qy
DUs=
You can add the results constructor as many times as you need.
See also: Results builder
See also: Results filters
Possible settings
Supports all settings of the HTML::LinkExtractor parser, as well as additionally:
Parameter name | Default value | Description |
---|---|---|
Check robots.txt | ☑ | Determines whether to check for indexing prohibition via robots.txt |
Match link by substring | ☐ | Determines whether to search for a link by substring. You can check links without specifying the scheme, for example, by domain without specifying the http protocol |