SE::Google::Cache - Checking for page presence in Google's cache
Overview of the scraper
The Google Cache scraper checks for the presence of a page in Google's cache.
Saving results is possible in the form and structure you need, thanks to the built-in powerful templating engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.
Collected data
- Date of page indexing in the cache
- Date of page indexing in Unix format
- Presence of the page in the cache
- Page data without google-toolbar
Use cases
- Determining the presence of a page in Google's cache
- Obtaining the date of the last Google snapshot
- Obtaining the date of the last Google snapshot in Unix format
- Retrieving the content of a page that is in the cache
Queries
As queries, you need to specify the URL of the page, for example:
https://a-parser.com
https://lenta.ru/
Examples of output results
Default output
Result format:
$query: $exists - $date\n
Example of a result, which displays the domain, presence in the cache (1 or 0), caching date:
https://lenta.ru/: 1 - 25 Dec 2020 10:44:05 GMT
Output in a CSV table
Result format:
[% tools.CSVline(query, exists, date, timestamp) %]
Example of a result:
https://a-parser.com/wiki/index/,1," 18 Mar 2021 20:05:44 GMT",1616097944
Possible settings
Parameter | Default value | Description |
---|---|---|
Use sessions | ☑ | Saves good sessions which allows to scrape even faster, getting fewer errors |
Util::ReCaptcha2 preset | default | Determines whether to use Util::ReCaptcha2 to bypass reCAPTCHA |
Remove toolbar | ☑ | Indicates whether to remove the toolbar from the page |