While analyzing Hadoop landscape the first question which occurs is, which distributions should I go for? Here is a comparison of the key players in this space on various parameters
Cloudera is a miles ahead of the others when it comes to popularity. It is an old veteran and has hundreds of clients. The Market penetration of Cloudera (has clients like Groupon, Klout…) is quite high. The next guy in the race is HortonWorks (spin off of Yahoo), compared to Cloudera it is a newbie but it is making lot of inroads. Due to its partnership with Microsoft, Informatica and terradata it is gaining lot of popularity. MapR is the other distributions (more of a proprietary solution) which is also quite popular and has association with Amazon. The graph below depicts the popularity of various Hadoop distributions.
Latest version for Cloudera is CDH4.4. Offers two distributions
- Cloudera Standard is free distributions. Cloudera Standard includes the full CDH distributions plus Cloudera Manager for automated deployment, centralized cluster management and monitoring, and a full set of diagnostic tools. Upgrade to Cloudera Enterprise for the maximum Cloudera Manager feature set.
- Cloudera Enterprise is a paid distributions. This includes everything in Cloudera Standard plus additional Cloudera Manager features (see below), Cloudera Navigator for data management, and Cloudera Support.
HortonWorks offers following
- Data Platform(HDP) 1.3 is a 100% open-source Hadoop distributions. (HortonWorks is only one which offers Hadoop distributions for Windows.) HortonWorks is fully open source
- HDP 2 is presently in Beta.
MapR is offering following editions
- M7 Edition is enterprise NoSQL and Hadoop Edition. It is a complete distributions for Apache Hadoop that delivers ease of use, dependability and performance advantages for NoSQL and Hadoop applications. M7 has removed the trade-offs organizations face when looking to deploy a NoSQL solution. M7 provides scale, strong consistency, reliability and continuous low latency with an architecture that does not require compactions or background consistency checks.
- M5 Edition is also a complete distributions for Apache Hadoop that delivers enterprise-grade features for all file operations on Hadoop. Features include mirroring, snapshots, NFS HA, data placement control, and many more, which the most demanding mission-critical environments will welcome.
- M3 Edition this is a free edition. The M3 edition delivers a fully random read-write capable platform that supports industry-standard interfaces (e.g., NFS, ODBC), and provides management, compression and performance advantages.
Cloudera an HortonWorks are built on top of Apache Hadoop, whereas MapR is not, the file system of MapR has been developed in C. MapR does not use HDFS, but has its distributed file system with NFS interface which is mutable and mountable, unlike HDFS.
As with any open source, the one which is going to win this race is player which has biggest community associated with it. Both Cloudera and Hortonworks have been distributing 100% open source variants. However, some of the tools from Cloudera are not open source. Both Cloudera and HortonWorks have been contributing to Hadoop but Cloudera has a slight edge here.