Skip to main content

Reddit::Posts - Reddit post scraper

SE::Quora

Overview of Reddit::Posts scraper

Reddit::PostsReddit::Posts - Reddit post scraper.

Collects a list of posts and a multitude of information for each of them from the eponymous service.

You can use automatic query multiplication, substitution of subqueries from files, iteration over alphanumeric combinations and lists to obtain the maximum possible number of results.

A-Parser functionality allows you to save the scraper settings of Reddit::Posts for future use (presets), set a scraping schedule, and much more.

Saving results is possible in the form and structure that you need, thanks to the built-in powerful templating engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.

Data Collected

Array of posts:

  • Link to the post
  • Title and flair
  • Rating, number of comments, and number of awards
  • Creation date
  • Community where the post was published
  • Author and their flair
  • Post content: text in markdown, link to media content, and link to an external resource
  • Whether the post is promotional

Features

  • Specifying the number of pages for scraping
  • Specifying the method of sorting results
  • Choosing the timing of results
  • Ability to scrape within a specific community

Use Cases

  • Any scenarios where it is necessary to obtain data about posts on Reddit

Queries

Several types of queries are supported:

Example:

https://www.reddit.com/t/bitcoin/
https://www.reddit.com/t/kim_kardashian/

By default, the result will display a list of links to posts, for example:

https://www.reddit.com/r/Bitcoin/comments/14nbyy2/i_took_out_a_35000_loan_to_buy_bitcoin_1_year/
https://www.reddit.com/r/CryptoCurrency/comments/14guprs/bitcoin_is_up_75_since_jim_cramer_told_investors/
https://www.reddit.com/r/Bitcoin/comments/14opp2t/this_guy_was_paid_32_bitcoin_to_hold_up_this_sign/
https://www.reddit.com/r/CryptoCurrency/comments/14ivx43/nearly_69_of_all_bitcoin_supply_did_not_move_in/
https://www.reddit.com/r/CryptoCurrency/comments/149vy0o/bitcoin_dips_below_25k_for_the_first_time_in_3/
...

Parameters in the links indicating the timing and sorting of results are also considered, while those set in the settings are ignored. Example:

https://www.reddit.com/r/nba/
https://www.reddit.com/r/OrlandoMagic/top/?t=month

By default, the result will display a list of links to posts, for example:

https://www.reddit.com/r/OrlandoMagic/comments/14a5br2/
https://www.reddit.com/r/OrlandoMagic/comments/14nqfk1/keep_mo_or_no_mo/
https://www.reddit.com/r/nba/comments/14nfzki/202324_nba_free_agent_tracker/
https://www.reddit.com/user/Grammarly/comments/14ghtld/verbessere_deine_schreibfertigkeit_auf_englisch/
https://www.reddit.com/r/nba/comments/14r4l4s/vernon_dillon_brooks_took_991_shots_last_year_he/
https://www.reddit.com/r/nba/comments/14ql1es/highlight_matt_devlin_inexplicably_yells_punjabi/
https://www.reddit.com/user/TelekomShop/comments/yqkina/der_highspeedhotspot_zum_mitnehmen_die_speedbox/
https://www.reddit.com/r/nba/comments/14qysvi/michael_jordan_with_the_spin_hanging_onehanded/
https://www.reddit.com/r/nba/comments/14qxrep/dwyane_wade_leads_the_redeem_team_with_27_points/
...

Keywords

Example:

wordpress features
parser

By default, the result will display a list of links to posts, for example:

https://www.reddit.com/r/ShitpostXIV/comments/14511em/i_am_a_proud_grey_parser/
https://www.reddit.com/r/opengl/comments/147sbjk/4_hours_of_my_obj_parser_so_far/
https://www.reddit.com/r/Compilers/comments/14pi9xh/demystifying_pratt_parsers/
https://www.reddit.com/r/ZETTAHOST/comments/11qdg99/how_to_change_the_wordpress_featured_image_size/
https://www.reddit.com/r/Wordpress/comments/14p1k2p/what_features_is_wordpress_missing_i_want_to_help/
https://www.reddit.com/r/Wordpress/comments/13q8g5x/is_it_possible_and_advisable_to_build_a_website/
...

The scraper supports searching by keyword within a specific community. To do this, the query must specify the keyword and a link to the community separated by a space. Example:

jesus https://www.reddit.com/r/atheism/
stage 3 https://www.reddit.com/r/Audi/

By default, the result will display a list of links to posts, for example:

https://www.reddit.com/r/Audi/comments/vi6cs5/thoughts_on_used_stage_3_2017_a3/
https://www.reddit.com/r/Audi/comments/lfvjuo/just_picked_up_this_beauty_stage_3_b5_s4/
https://www.reddit.com/r/Audi/comments/ssr8ui/anyone_else_track_their_audis_ttrs_stage_3_big/
https://www.reddit.com/r/atheism/comments/14lq0y6/heaven_and_hell_are_not_what_jesus_preached/
https://www.reddit.com/r/atheism/comments/13gxzj6/so_jesus_freaks_can_shove_their_religion_onto/
https://www.reddit.com/r/atheism/comments/13b8kl6/chris_pratt_compares_his_struggles_to_jesus/
https://www.reddit.com/r/atheism/comments/137k88b/artwork_of_jesus_surrounded_by_hot_leather/
...

Result Output Options

A-Parser supports flexible formatting of results thanks to the built-in templating engine Template Toolkit, which allows it to output results in any form, as well as in structured formats, such as CSV or JSON.

Possible Settings

ParameterDefault ValueDescription
Pages count5Number of search result pages
SortRelevanceSorting of results
TimeAll timeTiming of results