Skip to main content

Rank::Archive - Scraper for the date of the first and last caching of a site in the web archive

Overview

OverviewRank::ArchiveRank::Archive – is a Web Archive scraper that determines the date of the first and last caching, as well as the number of saved copies of the site.

The functionality of A-Parser allows you to save the scraping settings of the Rank::Archive scraper for further use (presets), set a scraping schedule, and much more.

Saving results is possible in the form and structure that you need, thanks to the built-in powerful template engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.

Collected data

  • Date of the first caching
  • Date of the last caching
  • Number of saved copies of the site
Collected data

Use cases

  • Checking for a copy of the site in the web archive, as well as the dates of indexing the first and last copies
  • Domain evaluation: a large number of site copies in the web archive may indicate high site traffic

Queries

As queries, you need to specify the domain of the site you are looking for, for example:

a-parser.com
www.yahoo.com
google.com
vk.com
youtube.com

Output results examples

A-Parser supports flexible formatting of results thanks to the built-in template engine Template Toolkit, which allows it to output results in an arbitrary form, as well as in a structured one, for example CSV or JSON

Default output

Result format:

$query: $first - $last ($times times)\n

The result shows the site, the dates of indexing the first and last copies, and the number of saved copies of the site:

vk.com: 11.05.2000 - 21.05.2014(8965 times)  
youtube.com: 28.04.2005 - 21.05.2014(28150 times)
a-parser.com: 16.03.2012 - 17.05.2014(56 times)
google.com: 11.11.1998 - 21.05.2014(34575 times)
www.yahoo.com: 17.10.1996 - 20.05.2014(28537 times)

Saving in SQL format

Result format:

[% "INSERT INTO archive VALUES('" _ query _ "', '" _ first _ "', '" _ last _ "', '" _ times _ "')\n" %]

Example of the result:

INSERT INTO archive VALUES('http://a-parser.com/', '16.03.2012', '16.01.2021', '290')
INSERT INTO archive VALUES('http://yandex.ru/', '06.12.1998', '25.03.2021', '141421')
INSERT INTO archive VALUES('http://facebook.com/', '12.12.1998', '25.03.2021', '4877156')
INSERT INTO archive VALUES('http://vk.com/', '11.05.2000', '25.03.2021', '172132')
INSERT INTO archive VALUES('http://google.com/', '11.11.1998', '25.03.2021', '5969502')
INSERT INTO archive VALUES('http://youtube.com/', '28.04.2005', '25.03.2021', '2309673')

Dumping results to JSON

Общий формат результата:

[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;

obj = {};
obj.query = query;
obj.first = p1.first;
obj.last = p1.last;
obj.times = p1.times;

obj.json %]

Начальный текст:

[

Конечный текст:

]

Example of the result:

[
{"first":"12.12.1998","query":"http://facebook.com/","last":"25.03.2021","times":4877156},
{"first":"06.12.1998","query":"http://yandex.ru/","last":"25.03.2021","times":141421},
{"first":"16.03.2012","query":"http://a-parser.com/","last":"16.01.2021","times":290},
{"first":"28.04.2005","query":"http://youtube.com/","last":"25.03.2021","times":2309673},
{"first":"11.11.1998","query":"http://google.com/","last":"25.03.2021","times":5969502},
{"first":"11.05.2000","query":"http://vk.com/","last":"25.03.2021","times":172132}
]
tip

To make the "Start text" and "End text" options available in the Task Editor, you need to activate "More options".

Possible settings