Powering real-time applications involves scaling not only existing enterprise applications, but also new applications that have emerged from the web, social media, and mobile devices. Overwhelmed by massive data growth, businesses must carefully select cost-effective technologies that can enable applications to easily manage both the data volumes of today and its exponential growth in the future.
Business applications commonly access data using Structured Query Language (SQL) through legacy relational database management systems (RDBMSs), such as Oracle Database, IBM DB2, or MySQL. However, many existing SQL applications are being overwhelmed by data growth because the legacy RDBMSs that support them often hit a wall, either from a cost or performance perspective. Scaling up these systems can result in having to invest in expensive database hardware replacements, or migrating to alternative technologies, like NoSQL databases, which often requires cost-prohibitive application re-writes. This leaves enterprises looking for a middle ground that combines cost-effective scalability and standard RDBMS features.
Supporting Applications of Old and New
The modern world of web, social, mobile and Internet of Things (IoT) applications is very demanding on databases, requiring them to ensure real-time responses while handling massively increasing data volumes at increasing velocity.
These new applications are extracting information from a plethora of sources within the Internet of Things, including countless sensors, social platforms, and mobile devices. They serve a number of use cases, such as digital marketing, healthcare, and fraud detection, which share three key workload characteristics: complex, interactive queries, data updates in real time, and high concurrency of small reads and writes.
The key to riding the wave of Big Data and enabling digital applications in real time is to select a database that can support massive data growth without bursting IT budgets. That means businesses need a database that can scale, while maintaining superior performance. Â The chart below breaks down the differences between legacy databases of the past, and scalable, high performing and cost-effective databases of the future.
PAST Â | FUTURE |
Small data volumes; purge often | Massive data volume; retain forever |
Slow data velocity | Rapid data velocity |
Rigid, static data of similar structure | Flexible, fluid data of many structures |
One-to-one, shared disk architecture | Many-to-many, shared nothing architecture |
Primary storage | Scale both writes and reads |
Scale-up on proprietary hardware | Scale-out on commodity hardware |
Choosing a Database without Sacrificing the Two P’s
Selecting a database ultimately comes down to two magic words: price and performance. If money is no object, scale-up platforms, such as Oracle Exadata or IBM DB2, work well because they often do not require changes to applications, and many businesses like the reliability and security of a proprietary, legacy system. However, scaling up requires a hardware migration every time a new, larger server replaces the old one, a process that suffers from the law of diminishing returns: costs will rise significantly faster than performance, and eventually, technological innovation will plateau to the point where high performance cannot be achieved, no matter what the price is.
Four Reasons to Choose The Hadoop RDBMS to Power Real-Time Apps
Application developers who wish to tap into large amounts of real-time data need an affordable, scalable option for web and mobile application development. There are four key reasons why the Hadoop RDBMS can be a compelling solution for powering real-time applications.
- General-purpose platform – The Hadoop RDBMS is a general purpose, operational database capable of handling mixed operational and analytical workloads (i.e. OLTP and OLAP) with real-time queries. Unlike NoSQL solutions that effectively handle simple web applications, but do not perform well with transactional operations, the Hadoop RDBMS easily handles the data updates across multiple rows and tables required by mission-critical business applications.
- Full SQL support – By combining full ANSI SQL support with the Hadoop ecosystem, businesses can scale-out from gigabytes to petabytes using the Hadoop RDBMS without needing to rewrite their existing SQL applications or retrain their IT staff like NoSQL solutions require.
- Real-time updates with transactions – The issue with many NoSQL solutions is that they sacrifice transactions, which makes it very difficult to concurrently update data across multiple rows or tables in real time. A Hadoop RDBMS, on the other hand, supports full ACID transactions, allowing for thousands of concurrent users and remote connections to access and alter data simultaneously. Transactional integrity is vital to powering real-time applications, and more enterprises are requiring its foundational capabilities in their databases.
- Developer framework support  – Application developers can be reassured that a Hadoop RDBMS supports a number of developer frameworks, including .NET, Java, Python, and Ruby on Rails, as well as those written in JavaScript/AngularJS. This allows developers to build applications quickly and easily, using the tools with which they are most productive.
Although emerging SQL-on-Hadoop solutions claim to support SQL, their capabilities are often limited. Enterprises will find all of the key functionality they have currently in their legacy SQL databases in the Hadoop RDBMS, including:
- Joins
- Secondary indexes
- Aggregations
- Stored procedures
- Window functions
When NoSQL emerged on the database market, many companies threw the baby out with the bathwater; in other words, they prematurely got rid of SQL and the benefits that came with it, believing that NoSQL was the only option to effectively scale their applications. But NoSQL systems lack data consistency, can be difficult to test and maintain and lack a structured query language. Over time, many of these companies learned that NoSQL was not the right fit, because for some applications, NoSQL databases force developers to reinvent the wheel, as they need to re-implement joins and transactions at the application level.
Today, technology has evolved to provide massive scalability without sacrificing functionality and performance. The Hadoop RDBMS effectively combines the reliability and functionality of an RDBMS with the scale-out of NoSQL, making it an optimal choice for powering real-time web, mobile, social OLTP, and Internet of Things applications.
Monte Zweben is Chariman and CEO of SpliceMachine. He spent the early years of his career with the NASA Ames Research Center as the Deputy Chief of the Artificial Intelligence Branch, eventually leading his team to winning the Space Act Award. He went on to become a successful serial entrepreneur with major exits, including a 2.9 Billion Dollar IPO for Blue Martini and a 225 Million Dollar sale of Red Pepper Software. Author of various scientific and business articles in scholarly journals and proceedings, including Harvard Business Review, Artificial Intelligence, Cognitive Science, AAAI, and IJCAI, he holds degrees in Computer Science from Carnegie Mellon University and Stanford University
(Image Credit: Philip Kromer / n1atsigns2 / CC BY SA 2.0 )