Skip to main content

SE::Google::Cache - Checking for Page Presence in Google Cache

Google Cache

Overview of the scraper

The Google Cache scraper checks for the presence of a page in Google's cache.

Saving results is possible in the form and structure you need, thanks to the built-in powerful templating engine Template Toolkit which allows applying additional logic to the results and outputting data in various formats, including JSON, SQL and CSV.

Collected data

Collected data
  • Page indexing date in the cache
  • Page indexing date in Unix format
  • Presence of the page in the cache
  • Page data without the Google toolbar

Use cases

  • Determining the presence of a page in Google's cache
  • Getting the date of the last Google snapshot
  • Getting the date of the last Google snapshot in Unix format
  • Getting the content of the page that is in the cache

Queries

Queries should be the URL of the page, for example:

https://a-parser.com
https://lenta.ru/

Examples of output results

Default output

Result format:

$query: $exists - $date\n

Example of a result showing the domain, presence in cache (1 or 0), and caching date:

https://lenta.ru/: 1 -  25 Dec 2020 10:44:05 GMT

Output to CSV table

Result format:

[% tools.CSVline(query, exists, date, timestamp) %]

Example result:

https://a-parser.com/wiki/index/,1," 18 Mar 2021 20:05:44 GMT",1616097944

Possible settings

ParameterDefault ValueDescription
Use sessionsSaves good sessions, allowing faster scraping with fewer errors
Util::ReCaptcha2 presetdefaultDetermines whether to use Util::ReCaptcha2Util::ReCaptcha2 to bypass reCAPTCHA
Remove toolbarSpecifies whether to remove the toolbar from the page