With SQL now invading the NoSQL camp, (see here), how should an organization choose between a traditional SQL database, a NoSQL data store, or NewSQL database? 2015 Turing Award winner Mike Stonebraker said it best: “one size does not fit all”. The idea that a single database product can satisfy any (or all) use cases simply isn’t true these days.
On November 25th-26th 2019, we are bringing together a global community of data-driven pioneers to talk about the latest trends in tech & data at Data Natives Conference 2019. Get your ticket now at a discounted Early Bird price!
If you are happy with the performance, scalability, and high availability of your current traditional SQL database system (the likes of Oracle, SQL Server, MySQL), then there is no reason to read further. However, if you have growing pains in any of these areas, then a NoSQL or NewSQL offering may be right for you. So how do you choose between them?
Choosing the right tool for the job at hand is 80% of getting to a solution; the other 20% is really understanding the problem you’re trying to solve. Here’s a rundown of the advantages and disadvantages of traditional SQL databases (affectionately called OldSQL in this article), NoSQL and NewSQL that can help you focus your data store choices.
Traditional SQL databases have been around for decades and serve as the core foundation of nearly every application we use today. If you have a deployed application and it is behaving and performing acceptably, that is fantastic; there is no need to change your data store. Realize that porting or replacing your database not only introduces work, it introduces risk. It is rare that a database replacement candidate is feature-for-feature, as well as bug-for-bug, compatible with the database you are replacing. Beware any vendor who suggests otherwise! Here are some of the typical patterns of a traditional database application:
The OldSQL advantages:
- Proven disk-based database technology and standard SQL support, hardened over several decades.
- Compatible with ORM layers, such as Hibernate or ActiveRecord.
- Rich client-side transactions.
- Ad hoc query and reporting.
- An established market and ecosystem with vast amounts of standards-based tooling.
The OldSQL disadvantages:
- Not a scale-out architecture. Transactional throughput is often gated by the capacity of a single machine. Scaling out requires application-defined and managed sharding (or partitioning) of the data.
- Traditional SQL systems were built for “one size fits all.” They are good for general purpose applications with modest performance requirements, but struggle as needs grow.
- Complex tuning parameters often require deep expertise to get the best balance between performance, data safety, and resource use.
First, realize that the term NoSQL is about as descriptive as categorizing dogs and horses as “NoCats”. In truth, NoSQL is a broad category collecting disparate technologies beneath an ambiguous umbrella. The term offers little help to the developer trying to decide on the right tool for the right job.
So let’s break it down with an eye on what we really care about as software engineers: what problems can I solve with NoSQL? Equally important, where is NoSQL a bad fit? Where do the different technologies show their strengths?
Some NoSQL Systems Put Availability First
Say you have gigabytes to petabytes of data. New data is added regularly and, once added, is relatively static. A database that archives sensor readings or ad impression displays is a good example. You want to store this in a cloud and are willing to tolerate the programming challenges of eventual consistency (made easier because most updates are idempotent anyway) for distributed access, multi-datacenter replication, and the highest possible availability.
Your application-to-database interactions are simple “CREATE” and “GET” patterns that don’t require traditional transactions. The most important consideration is that the database is always available to accept new content and can always provide content when queried, even if that content is not the most recent version written. Such systems include DynamoDB, Riak and Cassandra.
Some NoSQL Systems Focus on Flexibility
Made famous by MongoDB and CouchDB, the Document Model expands upon the traditional key-value store by replacing the values with JSON-structured documents, each able to contain sub-keys and sub-values, arrays of value, or hierarchies of all of the above. Often described as schemaless, these systems don’t enforce premeditated or consistent schema across all of the stored documents. This makes managing schema different… less rigid, but also much messier. It is likely the benefits of this approach are more applicable for smaller development teams with simpler data needs.
Other systems expand upon key-value stores with organizational features. Redis is popular for creating many sorted lists of data for easy ranking and leaderboards. By adding more complex functions to order by and computer statistics on, its focus allows for functionality key to its specific use cases.
Some NoSQL Systems Focus on Alternative Data Models or Special Use Cases
The most common examples are systems tuned for graph processing, such as Neo4j. Array databases are another such example; SciDB uses Python and R to access MPP array data for scientific research. Accumulo is a variation on the wide-column-store model popularized by Cassandra and BigTable, but with a focus on cell-level security. Systems like etcd are distributed datastores with a focus on storing configuration data for other services. Elasticsearch is a popular system for implementing text search within applications.
The NoSQL advantages:
- Eventual-consistency algorithms allow implementations to deliver the highest availability across multiple data centers.
- Eventual-consistency based systems scale update workloads better than traditional
OLAP RDBMs, while also scaling to very large datasets.
- Many NoSQL systems are optimized to support non-relational data, such as log messages, XML and JSON documents, as well as unstructured documents, allowing you to skip specifying schema-on-write, and allowing you to specify a schema-on-read.
The NoSQL disadvantages:
- These systems are fundamentally not transactional (ACID). If they advertise otherwise, beware the over-reaching claim.
- OLAP-style queries require a lot of application code. While the write-scaling advantages are appealing vs. OLAP stores (such as Vertica or GreenPlum), you sacrifice declarative ad hoc queries – important to historical analytical exploration.
The term NewSQL is not quite as broad as NoSQL. NewSQL systems all start with the relational data model and the SQL query language, and they all try to address some of the same sorts of scalability, inflexibility or lack-of-focus that has driven the NoSQL movement. Many offer stronger consistency guarantees.
But within this group there are many differences. HANA was created to be a business reporting powerhouse that could also handle a modest transactional workload, a perfect fit for SAP deployments. Hekaton adds sophisticated in-memory processing to the more traditional Microsoft SQL Server. Both systems are non-clustering for now, and both are designed to replace or enhance OldSQL deployments directly.
NuoDB set out to be a cluster-first SQL database with a focus on cloud-ops: run on many nodes across many datacenters and let the underlying system manage data locality and consistency for you. This comes at a cost in performance and consistency for arbitrary workloads. For workloads that are closer to key-value, global data management is a more tractable problem. NuoDB is the closest to being called eventually consistent of the NewSQL systems.
Other systems focus on clustered analytics, such as MemSQL. Distributed, with MySQL compatibility, MemSQL often offers faster OLAP analytics than all-in-one OldSQL systems, with higher concurrency and the ability to update data as it’s being analyzed.
VoltDB, the most mature of these systems, combines streaming analytics, strong ACID guarantees and native clustering. This allows VoltDB to be the system-of-record for data-intensive applications, while offering an integrated high-throughput, low-latency ingestion engine. It’s a great choice for policy enforcement, fraud/anomaly detection, or other fast-decisioning apps.
The Need for Speed: Fast in-memory SQL
Perhaps you have gigabytes to terabytes of data that needs high-speed transactional access. You have an incoming event stream (think sensors, mobile phones, network access points) and need per-event transactions to compute responses and analytics in real time. Your problem follows a pattern of “ingest, analyze, decide,” where the analytics and the decisions must be calculated per-request and not post-hoc in batch processing. NewSQL systems that offer the scale of NoSQL with stronger consistency may be the right choice.
The NewSQL advantages:
- Minimize application complexity stronger consistency and often full transactional support.
- Familiar SQL and standard tooling.
- Richer analytics leveraging SQL and extensions.
- Many systems offer NoSQL-style clustering with more traditional data and query models.
The NewSQL disadvantages:
- No NewSQL systems are as general-purpose as traditional SQL systems set out to be.
- In-memory architectures may be inappropriate for volumes exceeding a few terabytes.
- Offers only partial access to the rich tooling of traditional SQL systems.
Summing it up
As a general rule of thumb, consider evaluating NoSQL offerings if you favor availability or have special data model needs. Consider NewSQL if you’d like the at-scale speed of NoSQL, but with stronger consistency and the familiar and powerful SQL query language.
The froth in the data management space is substantial – and our tendency to talk in terms of categories (SQL, NoSQL, NewSQL) vs. problems makes it hard for software developers to understand what’s in the toolbox. The current offerings of new databases are not all alike – and recognizing how the DNA behind each helps or hinders problem solvers is the key to choosing the best solution.
John is the Vice President of Engineering at VoltDB, and has expertise growing and managing Agile development teams in early to mid-stage companies, specializing in highly-scalable Database, Web, Search/NLP, B2C and Enterprise, customer-facing product offerings.
Prior to joining VoltDB, John served as the Vice President of Web Engineering at EasyAsk Inc, where he was responsible for developing EasyAsk’s precision search and dynamic navigation solutions. Following the acquisition of EasyAsk by Progress Software, John became their Director of OpenEdge Database Division, where he oversaw three product lines – OpenEdge Database, ObjectStore, and Orbix.
(image credit: Scott McLeod)