Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Understanding Big Data: Open Source

byEileen McNulty
July 17, 2014
in Articles
Home Resources Articles

The open source technology market is huge. It fuels 1 million unique projects today, and opens up massive opportunities for small and large enterprises alike. It means small companies can deploy technologies in a cost-effective manner, and large enterprises have the means to scale; as John Gallaugher points out, Google has over 1.4 million servers; without open sourced technology, licensing costs on that scale would be huge.

Yet, open source is a multi-billion dollar industry. For technology that’s ostensibly free to use and develop, where is the money coming from? In this installment of Understanding Big Data, we’ll be looking at some of the leading open source providers- and how much you can actually get for free.

The Apache Foundation

The Apache Foundation have been providing users with community-led, open-source solutions for 15 years. They currently have nearly 150 top-level projects, covering a vast spectrum of technologies. Notable projects include:

  • Hadoop- In our overview of Hadoop, we defined it as “open-source framework for processing, storing and analysing data.” The fundamental principle behind Hadoop is rather than tackling one monolithic block of data all in one go, it’s more efficient to break up & distribute data into many parts, allowing processing and analysing of different parts concurrently. The Apache Foundation also develops a range of Hadoop integrations, which you can find out about here.
  • Lucene- Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. The Apache Foundation also developed an enterprise search sever based on Lucene’s search library known as Solr.
  • Storm- Storm is a project currently being incubated by the Apache Foundation. The aim of Storm is do for real-time processing what Hadoop did to batch processing. The main selling point of Storm is that it’s really, really fast: a benchmark clocked it at over a million tuples processed per second per node. It is also scalable, fault-tolerant and supports a range of programming languages. Current users include Twitter, the Weather Channel and WebMD.

Oracle

Understanding Big Data Open Source

Oracle RDBMS is the leading open-source RDBMS solution. Their SQL-based RDBMS solution, released in 1979, was the first commercially available technology of its kind, and revolutionised relational databases. They introduced partitioning in 1997, internet computing in 1999, and application clusters in 2001.

Their customer list, as you might expect, is extensive; high-profile customers include Vodafone, BT, Aria Systems and Deutsche Börse AG.

A comprehensive and insightful beginner’s guide to Oracle RDBMS can be found here.

MySQL

MySQL is currently owned by Oracle, after Oracle acquired Sun Microsystems in 2010. Its website claims it’s the most widely-used open-source database solution in the world, although parent company Oracle may have them beat.

MySQL was originally developed in 1994 by Michael Widenius and David Axmark and named after Widenius’ daughter, My. MySQL is ACID compliant, supports transactions, row-level locking as is highly scalable. It can also supports a much broader range of operating systems compared to many other database management systems; as well as Windows, OS X and Linux, it can also run on BSD, UNIX, AmigaOS, iOS, Symbian and Android.

It powers a range of well-known applications like WordPress, phpBB and Drupal, and is used by Wikipedia, Google, Facebook, Twitter, Flickr and Youtube.

There’s been widespread discussion about whether SQL-based technologies can compete in a big data environment, but as Carl W. Olofson of the International Data Corporation (IDC) stated earlier this year: “Without overstating the case, the MySQL movement is still revolutionary and is still young. It’s important to drive people to it and get them excited about it. MySQL is doing the jobs people need done.”

NoSQL Databases

Understanding Big Data Open Source 2

Many of the leading NoSQL databases have open source offerings. These include:

  • MongoDB- MongoDB is a document-store database, and is (according to DB-Engines) to 5th most popular database management system in the world
  • Cassandra- Cassandra is wide-column (or columnar) database focused around performance and scalability, supported by the Apache Cassandra Foundation
  • HBase- HBase is a NoSQL columnar database which is designed to run on top of HDFS. It is modelled after Google’s BigTable and written in Java. It was designed to provide BigTable-like capabilities to Hadoop, such as the columnar data storage model and storage for sparse data.

More information about different NoSQL solutions can be found in this previous installment of “Understanding Big Data”.

Are They Actually Free?

Short answer: yes. All of the technologies mentioned above are available to download and deploy for free. It is worth keeping in mind that there are various open source licensing agreements (such as GPL and the Apache License), which have different provisions and are constantly evolving. It’s undoubtedly worth doing your research; Black Duck’s Knowledge Base is a good place to start looking at the freedoms and limitations of open source licenses.

But, you may have noticed some of the companies mentioned above are multi-million dollar enterprises. How do they make this money? By offering “enterprise” additions of their products, with added features.

The Apache Software Foundation is entirely open source. However, there are several external companies which offer “enterprise-class” Hadoop, such as Hortonworks and Cloudera. They offer the Hadoop technologies with added features such as greater security and stability as well as training for companies unfamiliar with the technology, and exclusive integrations with other technologies. There’s a similar industry around Apache Cassandra; Datastax offers Cassandra with added security, search, analytics and management features.

Oracle RDBMS is free, but they offer a vast range of products that aren’t– downloading and deploying their specialist big data management system could cost in the region of $300,000.

The open source MySQL offering is known as the “community edition”; they also offer a Standard Edition for $2,000, Enterprise Edition for $5,000 and a Cluster Carrier Edition for $10,000- a full breakdown of the features available in each edition can be found here.

Many of the NoSQL solutions, such as Couchbase and MongoDB, operate on a “freemium” model. Their core technology is open sourced, but the enterprise edition with more features and greater security and support is monetised.

So what can you get for free? In summary, quite alot. You can have access to world-leading database management systems without parting with a penny. But if you do have money to spend, there’s plenty of enterprises out there that want to make it easier for you to take the first steps towards crafting a big data architecture.

(Featured Image Credit: Raconteur)

Follow @DataconomyMedia


Eileen McNulty-Holmes – Editor

1069171_10151498260206906_1602723926_n

Eileen has five years’ experience in journalism and editing for a range of online publications. She has a degree in English Literature from the University of Exeter, and is particularly interested in big data’s application in humanities. She is a native of Shropshire, United Kingdom.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Email: [email protected]


Interested in more content like this? Sign up to our newsletter, and you wont miss a thing!

[mc4wp_form]

 

Tags: CassandracouchbaseDatabase Technology NewsletterHadooplucenemongoDBMySQLOraclesolrstormWeekly Newsletter

Related Posts

When Regulation Embraces Innovation: Xenco Medical Founder and CEO Jason Haider Discusses the Upcoming 2026 CMS Transforming Episode Accountability Model

When Regulation Embraces Innovation: Xenco Medical Founder and CEO Jason Haider Discusses the Upcoming 2026 CMS Transforming Episode Accountability Model

August 26, 2025
DeFAI and the Future of AI Agents

DeFAI and the Future of AI Agents

July 26, 2025
Unifying the fragmented AI ecosystem: A new paradigm for generative AI workflows

Unifying the fragmented AI ecosystem: A new paradigm for generative AI workflows

July 21, 2025

How to plan for technical debt before it buries you

July 21, 2025
Optimizing performance for a global user base

Optimizing performance for a global user base

July 17, 2025
How the right FPS mouse can make or break your game (or workflow)

How the right FPS mouse can make or break your game (or workflow)

July 14, 2025
Please login to join discussion

LATEST NEWS

Psychopathia Machinalis and the path to “Artificial Sanity”

GPT-4o Mini is fooled by psychology tactics

AI reveals what doctors cannot see in coma patients

Asian banks fight fraud with AI, ISO 20022

Android 16 Pixel bug silences notifications

Azure Integrated HSM hits every Microsoft server

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.