Skip to main content

FreeAI::Perplexity - Scraper for AI Service Perplexity

Perplexity

Overview of the scraper

The scraper Perplexity - is a modern tool for collecting structured information from one of the fastest-developing AI search engines. Thanks to integration with Perplexity, you don't just get lists of links, but up-to-date, concise, and relevant answers based on a large number of sources, including scientific articles, blogs, forums, and news portals.

The Perplexity scraper supports natural language queries, , including clarifications, contextual questions, and nested constructs. The scraper provides the ability to scrape relevant questions, automatically substituting them into the query queue, thereby significantly expanding the amount of collected information.

The processing speed reaches 500–800 queries per minute thanks to the multi-threaded operation mode. Depending on the configuration and presets used, you can receive thousands of unique text fragments and links within minutes.

The output results can be saved in any required format thanks to the powerful templating engine Template Toolkit, which allows structuring data in JSON, CSV, SQL and other formats, as well as applying filtering, sorting, and data aggregation on the fly.

The Perplexity scraper is ideal for competitive intelligence, fact and quote collection, knowledge base creation, news monitoring, and topic analysis, thanks to the high quality and contextuality of the results provided.

Data collected

  • Response text (in Markdown format)
  • Links, anchors, and snippets of data sources
  • List of related questions

Capabilities

  • Selection of information source type (multiple selections supported)
  • Substitution of similar questions into the query queue up to a specified depth
  • Bypassing protections and session support for more stable and faster operation

Use Cases

  • Collection of structured answers to thematic queries for creating knowledge bases, content plans, reference systems, and FAQ generation
  • Extraction of source links with anchors and snippets — ideal for building lists of authoritative resources, citations, and collecting backlinks
  • Collection of similar/refining questions from Perplexity results — useful for analyzing user interest, forming a semantic core, and generating ideas for articles
  • Monitoring of brand, product, or person mentions - tied to context and sources
  • Searching and analyzing expert opinions, trends, and insights from authoritative sources
  • Quick check of the relevance and completeness of information on key topics
  • Automation of competitor analysis: which resources are cited, which topics are covered, and how often
  • Support for research and analytical projects requiring the aggregation of accurate information from various sources
  • Any other tasks where you quickly need brief, accurate answers confirmed by real sources and logical context

Queries

The queries should be search queries, just as if they were entered directly into the Perplexity search form, for example:

How to learn fast?
How to improve memory and concentration?
What is a scraper?
TOP 10 Runet sites

Results

info

Hereinafter, the result examples are shortened for better clarity

By default, the query and the answer to it are output, for example:

What is a scraper?
A scraper — is a program or script that automatically collects, analyzes, and systematizes information from various sources, most often from websites[1][2][5][7]. The main task of a scraper — is to extract the necessary data (such as texts, prices, contacts, images) from structured or semi-structured data arrays, such as HTML pages, databases, text files, and other formats[1][5][6].

**How a scraper works:**
- Scans specified data sources (e.g., webpages).
...

TOP 10 Runet sites
## TOP-10 Runet sites in June 2025

Based on fresh data from Similarweb and other analytical resources, the list of the most visited websites in the Russian segment of the Internet (Runet) includes the following resources:

1. **Yandex.ru** — the largest Russian search engine and internet portal[2][6].
2. **Google.com** — a global search engine that is also actively used in Russia[2][6].
...

### Table for clarity

| Rank | Site | Main function |
|-------|----------------|------------------------------|
| 1 | yandex.ru | Search, services, portal |
| 2 | google.com | Search |
...

Output result options

A-Parser supports flexible result formatting thanks to the built-in templating engine Template Toolkit, which allows it to output results in any form, as well as in structured formats, such as CSV or JSON.

Exporting a list of links

Result format:

$sources.format('$link\n')

Example result:

https://ru.wikipedia.org/wiki/%D0%91%D0%B8%D1%82%D0%BA%D0%BE%D0%B9%D0%BD
https://www.kaspersky.ru/resource-center/definitions/what-is-bitcoin
https://dzengi.com/ru/chto-takoe-bitcoin-prostim-yazikom
https://www.sberbank.ru/ru/person/kibrary/vocabulary/bitkoin
https://help.cryptopay.me/ru/articles/3414939-%D1%87%D1%82%D0%BE-%D1%82%D0%B0%D0%BA%D0%BE%D0%B5-%D0%B1%D0%B8%D1%82%D0%BA%D0%BE%D0%B8%D0%BD
...

Outputting links, anchors, and snippets with their positions in CSV

Result format:

[% FOREACH item IN sources;
tools.CSVline(loop.count, item.link, item.anchor, item.snippet);
END %]

Example result:

...
6,https://www.kraken.com/ru/learn/what-is-bitcoin-btc,"What is Bitcoin (BTC)? Complete guide - Kraken","Learn about Bitcoin's decentralized nature, limited supply, and its role as a digital currency. Find out what powers BTC, what its core principles and use cases are."
7,https://www.vedomosti.ru/finance/articles/2024/09/23/1064026-bitkoin,"What is Bitcoin and why is it needed - Vedomosti","It is a digital currency used as a means of payment and a financial asset"
8,https://forklog.com/cryptorium/chto-takoe-bitkoin,"What is Bitcoin and how does it work in simple terms? - ForkLog","Bitcoin — is a decentralized system based on the principle of direct exchange between users. The eponymous cryptocurrency BTC is used for transactions."
tip

In the General result format, the Template Toolkit templating engine is used to output the $sources array in a FOREACH loop.

In the result file name, you simply need to change the file extension to csv.

Outputting the question, answer, and a list of similar questions in JSON

Общий формат результата:

[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;

obj = {};
obj.query = query;
obj.answer = p1.answer;
obj.related = [];

FOREACH item IN p1.related;
obj.related.push(item.text);
END;

obj.json %]

Начальный текст:

[

Конечный текст:

]

Example result:

[{"related":["Why Bitcoin is considered the first cryptocurrency and how it differs from traditional money","How the blockchain technology underlying Bitcoin works","What cryptographic methods protect transactions in the Bitcoin system","How the 21 million coin limit makes Bitcoin a unique asset","What advantages decentralization and the absence of intermediaries provide when using Bitcoin],"answer":"**Bitcoin** (Bitcoin, BTC) — is the first and most famous cryptocurrency, representing a decentralized digital payment system based on blockchain technology. In this system, all transactions are recorded in a public ledger (blockchain), which is protected by cryptographic methods and is accessible for verification by any network participant[1][3][4].\n...","query":"What is Bitcoin?"},{"related":["What are the basic rules and tips that help you google correctly","Why it is important to avoid questions and complex sentences when searching","How to use the English language for more effective searching on Google","What operators and symbols help expand or refine a search","How using quotation marks and a tilde differs when searching for information"],"answer":"## How to Google correctly: basic tips\\n\\n**Formulate queries briefly and to the point**\n- Use 26 keywords, avoid long questions and complex sentences. For example, instead of \\"What should I do if the internet is not working on my windows computer?\\" use \\"internet not working windows how to fix\\"[1].\n\n**Search for exact phrases**\n...","query":"How to Google correctly?"}]

Possible settings

Parameter NameDefault ValueDescription
SourcesWebType of information source (multiple selection supported)
Use sessionsSaves good sessions, allowing for faster scraping with fewer errors
Bypass CloudFlareAutomatic bypass of CloudFlare protection
Bypass CloudFlare Browser Max Pages10Max number of pages when bypassing CF
Bypass CloudFlare Browser HeadlessIf enabled, the browser will not be displayed during CF bypass