Hadoop is designed to make storing and processing large amounts of data affordable. It does this in two ways:
- Through hardware, by allowing use of normal processing chips, and
- Through software, with the open-source Apache licence
Open-source software essentially means Hadoop is free, or more specifically no licence fee needs to be paid. On the other hand, there are number of weaknesses that makes it less ideal for commercial use:
1. Integration with existing systems
Hadoop is not optimised for ease for use. Installing and integrating with existing databases might prove to be difficult, especially since there is no software support provided.
2. Administration and ease of use
Hadoop requires knowledge of MapReduce, while most data practitioners use SQL. This means significant training may be required to administer Hadoop clusters.
3. Security
Hadoop lacks the level of security functionality needed for safe enterprise deployment, especially if it concerns sensitive data.
4. Single Point of Failure (SPOF)
The original version of Hadoop only has a single node responsible for understanding where all data is located, which makes the cluster useless should that node fail. This issue has been resolved in Hadoop 2.0.
Pure-play Hadoop companies such as Cloudera, MapR and Hortonworks seek to make Hadoop more enterprise-ready, by developing platform that plugs these gaps, but may entail licencing fees..
In conjunction with Big Data Week, we compiled a list of the best Big Data articles here.
(Image credit: Timothy Appnel)