As basis for domains we use top million domains from Alexa, the basis can be downloaded here: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip Initial data: Server with Quad-core processor Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz, 8 Gb RAM and bandwidth 100 mbit\s In settings of parser is set use of 6 cores of CPU since the processor supports 8 executive cores with Hyper-Threading technology, 2 cores are left for stable system work Screenshot of settings of the task: Source file with requests contains data in the format <alexa-rank>,<domain>, one million records, each domain begin from new line. By means of the Queries Builder we separate the domain and it rank. For parser Rank::CMS need to specify the full link to the site or the page, therefore in query format we will add http:// We use a parser Rank::CMS with default settings, we specify that parsing will be made without proxy and at most 3 attempts on query For convenience we will save result in two formats - in the top-1m-cms.txt file we will write the domain, Alexa Rank and the name of CMS; in the top-1m/ folder we will save domains, automatically sorting names of files by the name CMS (i.e. in the WordPress.txt file there will be only domains with Wordpress and it is so similar for all defined CMS) By default check is executed on all CMS, forum engines and Wiki-engines Result of work of task: Statistics: Speed of parsing is 1100 domains per minute All 301841 of 1000000 domains as one of popular CMS, forums or Wiki using on the homepage were defined 126 different CMS are defined Top 10 most popular CMS, the first value defines number of domains: Code: 209855 WordPress 23732 Joomla 22945 Drupal 6488 TYPO3 CMS 4917 vBulletin 3726 1C-Bitrix 2515 phpBB 2415 ExpressionEngine 2022 DataLife Engine 1928 Microsoft SharePoint Spoiler: Code for import Code: eyJwcmVzZXQiOiJSYW5rIENNUyBBbGV4YSB0b3AtMWtrIiwidmFsdWUiOnsicGFy c2VycyI6W1siUmFuazo6Q01TIiwiZGVmYXVsdCIseyJ0eXBlIjoib3ZlcnJpZGUi LCJpZCI6InVzZXByb3h5IiwidmFsdWUiOmZhbHNlfSx7InR5cGUiOiJvdmVycmlk ZSIsImlkIjoicHJveHlyZXRyaWVzIiwidmFsdWUiOiIzIn1dXSwicmVzdWx0c0Zv cm1hdCI6IiRxdWVyeTskcXVlcnkuYWxleGE7JHAxLmNtc1xcbiIsInJlc3VsdHNT YXZlVG8iOiJmaWxlIiwicmVzdWx0c0ZpbGVOYW1lIjoidG9wLTFtLWNtcy50eHQi LCJhZGRpdGlvbmFsRm9ybWF0cyI6W1sidG9wLTFtLyR7cDEuY21zfS50eHQiLCIk cXVlcnlcXG4iXV0sInJlc3VsdHNVbmlxdWUiOiJubyIsInF1ZXJ5Rm9ybWF0Ijoi aHR0cDovLyRxdWVyeSIsInVuaXF1ZVF1ZXJpZXMiOmZhbHNlLCJzYXZlRmFpbGVk UXVlcmllcyI6ZmFsc2UsImRvTG9nIjoibm8iLCJrZWVwVW5pcXVlIjoiTm8iLCJt b3JlT3B0aW9ucyI6ZmFsc2UsInJlc3VsdHNQcmVwZW5kIjoiIiwicmVzdWx0c0Fw cGVuZCI6IiIsInF1ZXJ5QnVpbGRlcnMiOlt7InNvdXJjZSI6InF1ZXJ5IiwidHlw ZSI6InN0cmluZ1NwbGl0Iiwic2VwYXJhdG9yIjoiLCIsInRvIjpbImFsZXhhIiwi cXVlcnkiXX1dLCJyZXN1bHRzQnVpbGRlcnMiOltdLCJjb25maWdPdmVycmlkZXMi OltdfX0= Files of results: File with initial domains, Alexa Rank and the defined CMS, top-1m-cms.txt 37mb Archive with sorted on CMS files, top-1m.zip 7.6mb Considerably it is possible to increase parsing speed by reduction of quantity of the checked CMS, on a screenshot an example of the task in which only WordPress is checked, apparently speed increased more than by 8 times, thus resources of the server are enough for further increase in threads. Such task will be performed in only 2 hours