Collection of recipes #1: determine the CMS, check availability of mobile version and collect emails

Discussion in 'News' started by Support, Nov 3, 2015.

  1. Support

    Support Administrator
    Staff Member A-Parser Enterprise

    Joined:
    Mar 16, 2012
    Messages:
    4,547
    Likes Received:
    2,164
    Raffle prizes, terms of action see at the end of article!

    This post starts a series of articles with recipes of using A-Parser: complex examples from the simultaneous use of different functional of parser.

    Determine the CMS for 1000000 domains over 15 hours
    In the example described how to determine the engine used at sites from base Alexa top one million, the result is automatically sorted by a file named CMS. It is also an example of how to increase the speed of processing and test 1 million domains just 2 hours.
    [​IMG]
    Statistics:
    • Speed of parsing is 1100 domains per minute
    • All 301841 of 1000000 domains as one of popular CMS, forums or Wiki using on the homepage were defined
    • 126 different CMS are defined
    • Top 10 most popular CMS, the first value defines number of domains:
    Code:
    209855 WordPress
    23732 Joomla
    22945 Drupal
    6488 TYPO3 CMS
    4917 vBulletin
    3726 1C-Bitrix
    2515 phpBB
    2415 ExpressionEngine
    2022 DataLife Engine
    1928 Microsoft SharePoint
    More...


    Check availability of the mobile version for 1000000 websites
    We work with large volumes of data and learn to look for a match in the raw data.
    [​IMG]
    • during 8 hours of this job, we learned that almost 41% of the most visited sites do not have mobile versions. Who knows, maybe having acquired mobile version, they would become even more visitors?
    More...


    Collect 1.65 million emails from the pages of contacts for 2.5 hours
    Parse links to pages with contact information, and then collect from them the email-address.
    [​IMG]
    • The average speed of processing was 12,000 links per minute
    • TOP10 e-mail domains:
    Code:
    249772 mail.ru
    129894 gmail.com
    91901 yandex.ru
    25625 rambler.ru
    20821 bk.ru
    19773 hotmail.com
    14656 yahoo.com
    14117 list.ru
    13636 inbox.ru
    11670 ukr.net
    
    More...

    Almost 3 months ago, we had paid services for the compilation of tasks for A-parser. It proved to be very relevant among the new members of the parser, and among those who do not have time to study it. During its existence was made up of more than 70 tasks, at the same time 75% were composed of 2 or more presets. Average time to compile a single order from matching the details to the finished set of presets is about 4 hours. At the same time, each preset is thoroughly tested and received results are discussed with the client.

    As the A-Parser - a program for parsing various information, November 26, the International Day of information, will be awarded:
    • 5 packages proxy (100 threads per month)
    • 3 free compilation of the 1 task for A-Parser
    Participate in the campaign all, who retweet our news about this campaign on Twitter. Subscribe to our feeds in Twitter: Russian-speaking @a_parser and English @a_parser_en, and watch the news on the site! The winners will be determined randomly using the service random.org. Video determining the winners will be posted along with the results of the action.
     
    #1 Support, Nov 3, 2015
    Last edited by a moderator: Nov 4, 2015

Share This Page