Skip to main content

Rank::CMS - detecting over 600 types of CMS based on features. Detects all popular forums, blogs, CMS, guestbooks, wikis, and many other engine types

Overview of the scraper

Overview of the scraperRank::CMSRank::CMS – identifies over 600 types of CMS based on features. Identifies all popular forums, blogs, CMS, guestbooks, wikis, and many other types of engines.

A-Parser's functionality allows you to save Rank::CMS scraper parsing settings for future use (presets), set a parsing schedule, and much more.

Results can be saved in the form and structure you need, thanks to the built-in powerful templating engine Template Toolkit which allows you to apply additional logic to the results and output data in various formats, including JSON, SQL and CSV.

Collected data

  • CMS name
  • Category name

List of supported CMS

"1C-Bitrix", "2z Project", "3dCart", "Accessible Portal", "actionhero.js", "Adobe CQ5", "Ametys", "Amiro.CMS", "AMPcms", "Anchor CMS", "AsciiDoc", "Backdrop", "Banshee", "BIGACE", "Bolt", "BrowserCMS", "Business Catalyst", "Cargo", "Chameleon", "Ckan", "CMS Made Simple", "CMSimple", "Concrete5", "Contao", "Contenido", "Contens", "ContentBox", "Cotonti", "CPG Dragonfly", "CppCMS", "Craft CMS", "Danneo CMS", "DataLife Engine", "DedeCMS", "Django CMS", "DNN", "Dotclear", "Drupal", "DTG", "Dynamicweb", "e107", "Eleanor CMS", "EPiServer", "eSyndiCat", "ExpressionEngine", "eZ Publish", "FlexCMP", "GetSimple CMS", "Google Sites", "Graffiti CMS", "Grav", "Green Valley CMS", "GX WebManager", "Hippo", "Hotaru CMS", "IBM WebSphere Portal", "ImpressCMS", "ImpressPages", "Indexhibit", "Indico", "InProces", "InstantCMS", "io4 CMS", "Jalios", "Jekyll", "Joomla", "Kentico CMS", "Koala Framework", "Koken", "Kolibri CMS", "Komodo CMS", "Koobi", "Kooboo CMS", "Kotisivukone", "LEPTON", "Liferay", "LightMon Engine", "Lithium", "LiveStreet CMS", "Locomotive", "M.R. Inc Wild CMS", "Mambo", "MaxSite CMS", "Methode", "Microsoft SharePoint", "MODx", "Moguta.CMS", "Mono.net", "Movable Type", "Mozard Suite", "Mura CMS", "Mynetcap", "Nepso", "October CMS", "Odoo", "OpenCms", "openEngine", "OpenNemas", "OpenText Web Solutions", "Ophal", "Orchard CMS", "Pagekit", "PANSITE", "papaya CMS", "PencilBlue", "Percussion", "PHP-Fusion", "phpCMS", "phpSQLiteCMS", "phpwind", "Pligg", "Plone", "Posterous", "Quick.CMS", "RBS Change", "RCMS", "RiteCMS", "Roadiz CMS", "S.Builder", "Sarka-SPIP", "SDL Tridion", "Serendipity", "Silva", "SilverStripe", "SIMsite", "Sitecore", "SiteEdit", "Sivuviidakko", "SmartSite", "sNews", "Solodev", "SPIP", "Squarespace", "Squiz Matrix", "Subrion", "swift.engine", "Textpattern CMS", "Thelia", "TiddlyWiki", "Tiki Wiki CMS Groupware", "Twilight CMS", "TYPO3 CMS", "TYPO3 Neos", "uCore", "Umbraco", "Unbounce", "Ushahidi", "viennaCMS", "Vignette", "VIVVO", "webEdition", "WebGUI", "WebPublisher", "Webs", "WebsiteBaker", "WebsPlanet", "Weebly", "Wix", "Wolf CMS", "WordPress", "XOOPS"

Capabilities

  • Identification of 161 types of CMS based on features
  • Identifies all popular forums, blogs, CMS, guestbooks, wikis, and many other types of engines based on the large and high-quality Wappalyzer feature database (over 800 technologies in total)
  • Ability to select a category or specific engines for recognition
  • Ability to specify a custom User-Agent
  • Ability to modify and supplement the feature database
  • Ability to use your own feature file (the custom-apps.json file must have a structure similar to the regular apps.json and be located at files/Rank-CMS; if done correctly, new categories and applications for selection will appear at the end of the list in the Check list option)

Use cases

  • Filtering by engines
  • Sorting large databases by engines

Queries

Queries should be a list of domains, for example:

http://a-parser.com/  
http://techcrunch.com/
http://vkusnologia.ru/
http://blogautomobile.fr/
http://avto-blogger.ru/
http://www.cyberforum.ru/

Output results examples

A-Parser supports flexible result formatting thanks to the built-in templating engine Template Toolkit, which allows it to output results in an arbitrary form, as well as in structured form, such as CSV or JSON

Default output

Result format:

$query - $cms\n

Example result:

http://blogautomobile.fr/- WordPress  
http://a-parser.com/ - XenForo
http://vkusnologia.ru/ - WordPress
http://avto-blogger.ru/ - WordPress
http://techcrunch.com/ - WordPress
http://www.cyberforum.ru/ - 1C-Bitrix

Saving in SQL format

Result format:

[% "INSERT INTO cms VALUES('" _ query _ "', '" _ cms _ "', '" _ cat _ "')\n" %]

Example result:

INSERT INTO cms VALUES('http://yandex.ru', 'unknown', 'unknown')
INSERT INTO cms VALUES('http://vk.com', 'unknown', 'unknown')
INSERT INTO cms VALUES('http://facebook.com', 'unknown', 'unknown')
INSERT INTO cms VALUES('http://a-parser.com', 'WordPress', 'CMS')
INSERT INTO cms VALUES('http://youtube.com', 'unknown', 'unknown')
INSERT INTO cms VALUES('http://google.com', 'unknown', 'unknown')

Dump results to JSON

Общий формат результата:

[% IF notFirst;
",\n";
ELSE;
notFirst = 1;
END;

obj = {};
obj.query = query;
obj.cms = p1.cms;
obj.cat = p1.cat;

obj.json %]

Начальный текст:

[

Конечный текст:

]

Example result:

[
{"cat":"unknown","cms":"unknown","query":"http://google.com"},
{"cat":"unknown","cms":"unknown","query":"http://yandex.ru"},
{"cat":"unknown","cms":"unknown","query":"http://facebook.com"},
{"cat":"CMS","cms":"WordPress","query":"http://a-parser.com"},
{"cat":"unknown","cms":"unknown","query":"http://vk.com"},
{"cat":"unknown","cms":"unknown","query":"http://youtube.com"}
]
tip

For the "Initial text" and "Final text" options to be available in the Job Editor, you need to activate "More options".

Possible settings

ParameterDefault valueDescription
User agent_Automatically substitutes the user-agent of the current Chrome version_Allows representing yourself as a specific browser or search engine
Log long running regexDetermines whether to record slow regular expressions
Check listcms, message-boards, wikisSelection of engines for checking
Emulate browser headersAbility to emulate browser headers
RegExp engineRE2Selection of the regular expression engine
Use Net::HTTPAbility to use the Net::HTTPNet::HTTP scraper for queries
Net::HTTP presetdefaultAbility to specify a preset with settings