Reddit::Posts - Reddit post scraper
Overview of Reddit::Posts scraper
Reddit::Posts - Reddit post scraper.Collects a list of posts and a multitude of information for each of them from the eponymous service.
You can use automatic query multiplication, substitution of subqueries from files, iteration over alphanumeric combinations and lists to obtain the maximum possible number of results.
A-Parser functionality allows you to save the scraper settings of Reddit::Posts for future use (presets), set a scraping schedule, and much more.
Saving results is possible in the form and structure that you need, thanks to the built-in powerful templating engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL, and CSV.
Data Collected
Array of posts:
- Link to the post
- Title and flair
- Rating, number of comments, and number of awards
- Creation date
- Community where the post was published
- Author and their flair
- Post content: text in markdown, link to media content, and link to an external resource
- Whether the post is promotional
Features
- Specifying the number of pages for scraping
- Specifying the method of sorting results
- Choosing the timing of results
- Ability to scrape within a specific community
Use Cases
- Any scenarios where it is necessary to obtain data about posts on Reddit
Queries
Several types of queries are supported:
Links to topics
Example:
https://www.reddit.com/t/bitcoin/
https://www.reddit.com/t/kim_kardashian/
By default, the result will display a list of links to posts, for example:
https://www.reddit.com/r/Bitcoin/comments/14nbyy2/i_took_out_a_35000_loan_to_buy_bitcoin_1_year/
https://www.reddit.com/r/CryptoCurrency/comments/14guprs/bitcoin_is_up_75_since_jim_cramer_told_investors/
https://www.reddit.com/r/Bitcoin/comments/14opp2t/this_guy_was_paid_32_bitcoin_to_hold_up_this_sign/
https://www.reddit.com/r/CryptoCurrency/comments/14ivx43/nearly_69_of_all_bitcoin_supply_did_not_move_in/
https://www.reddit.com/r/CryptoCurrency/comments/149vy0o/bitcoin_dips_below_25k_for_the_first_time_in_3/
...
Links to communities
Parameters in the links indicating the timing and sorting of results are also considered, while those set in the settings are ignored. Example:
https://www.reddit.com/r/nba/
https://www.reddit.com/r/OrlandoMagic/top/?t=month
By default, the result will display a list of links to posts, for example:
https://www.reddit.com/r/OrlandoMagic/comments/14a5br2/
https://www.reddit.com/r/OrlandoMagic/comments/14nqfk1/keep_mo_or_no_mo/
https://www.reddit.com/r/nba/comments/14nfzki/202324_nba_free_agent_tracker/
https://www.reddit.com/user/Grammarly/comments/14ghtld/verbessere_deine_schreibfertigkeit_auf_englisch/
https://www.reddit.com/r/nba/comments/14r4l4s/vernon_dillon_brooks_took_991_shots_last_year_he/
https://www.reddit.com/r/nba/comments/14ql1es/highlight_matt_devlin_inexplicably_yells_punjabi/
https://www.reddit.com/user/TelekomShop/comments/yqkina/der_highspeedhotspot_zum_mitnehmen_die_speedbox/
https://www.reddit.com/r/nba/comments/14qysvi/michael_jordan_with_the_spin_hanging_onehanded/
https://www.reddit.com/r/nba/comments/14qxrep/dwyane_wade_leads_the_redeem_team_with_27_points/
...
Keywords
Example:
wordpress features
parser
By default, the result will display a list of links to posts, for example:
https://www.reddit.com/r/ShitpostXIV/comments/14511em/i_am_a_proud_grey_parser/
https://www.reddit.com/r/opengl/comments/147sbjk/4_hours_of_my_obj_parser_so_far/
https://www.reddit.com/r/Compilers/comments/14pi9xh/demystifying_pratt_parsers/
https://www.reddit.com/r/ZETTAHOST/comments/11qdg99/how_to_change_the_wordpress_featured_image_size/
https://www.reddit.com/r/Wordpress/comments/14p1k2p/what_features_is_wordpress_missing_i_want_to_help/
https://www.reddit.com/r/Wordpress/comments/13q8g5x/is_it_possible_and_advisable_to_build_a_website/
...
Keywords and links to communities
The scraper supports searching by keyword within a specific community. To do this, the query must specify the keyword and a link to the community separated by a space. Example:
jesus https://www.reddit.com/r/atheism/
stage 3 https://www.reddit.com/r/Audi/
By default, the result will display a list of links to posts, for example:
https://www.reddit.com/r/Audi/comments/vi6cs5/thoughts_on_used_stage_3_2017_a3/
https://www.reddit.com/r/Audi/comments/lfvjuo/just_picked_up_this_beauty_stage_3_b5_s4/
https://www.reddit.com/r/Audi/comments/ssr8ui/anyone_else_track_their_audis_ttrs_stage_3_big/
https://www.reddit.com/r/atheism/comments/14lq0y6/heaven_and_hell_are_not_what_jesus_preached/
https://www.reddit.com/r/atheism/comments/13gxzj6/so_jesus_freaks_can_shove_their_religion_onto/
https://www.reddit.com/r/atheism/comments/13b8kl6/chris_pratt_compares_his_struggles_to_jesus/
https://www.reddit.com/r/atheism/comments/137k88b/artwork_of_jesus_surrounded_by_hot_leather/
...
Result Output Options
A-Parser supports flexible formatting of results thanks to the built-in templating engine Template Toolkit, which allows it to output results in any form, as well as in structured formats, such as CSV or JSON.
Possible Settings
Parameter | Default Value | Description |
---|---|---|
Pages count | 5 | Number of search result pages |
Sort | Relevance | Sorting of results |
Time | All time | Timing of results |