Data that can’t be easily queried or read is like a book with the pages glued together: not very useful.
For decades, SQL was the established language used to interact with databases. Then data management requirements necessitated by cloud environments and big data use cases led to new systems. These new systems, collectively termed “NoSQL,” focused first on scalability and availability. They often sacrificed the interactivity and queryability of SQL as a false shortcut to scale.
Giving up SQL was not a necessary trade-off. In fact, this design choice created systems that were harder to use, narrowed their use case profile, and forced users to write complex programs to replicate what would have been simple SQL statements. For example, if you talk to enterprises that were early Hadoop adopters, many complain that using the new technology required the expertise of exclusive, highly skilled programmers, while legacy technologies that were SQL based were more accessible to analysts, data scientists, managers and marketers. The short-term choice to give up SQL caused a lot of pain for users, led to incompatible and proprietary query languages, and in some cases resulted in systems that were severely limited by weak interactivity and queryability.
Understanding the modern database landscape is difficult. There are a large variety of systems, many vendors, and many different and overlapping technologies. It is interesting, though, that across this landscape, almost all have learned that SQL is table stakes. The original NoSQL systems posited that SQL didn’t scale, that SQL didn’t work without relational (table-based) schema, and that SQL interfaces were inappropriate to “modern” use cases. All of these claims have been debunked. Users are driving vendors to embrace SQL. We now see NoSQL vendors advertising their SQL capabilities as competitive differentiators. It is easy to predict that over the next 12 to 18 months we will begin to see marketing literature comparing SQL compatibility across different NoSQL systems. Try explaining that one to your manager.
Let’s look at a few very different systems that have adopted SQL despite different storage designs, different primary use cases, and different data models.
“The system must provide full SQL query support and other functionality users expect from a SQL database. Features like indexes and ad hoc query are not just nice to have, but absolute requirements for our business.” (Google F1 paper). Google implemented a SQL system (F1) to manage its core adwords business. From 2012, F1 has been stated to manage 100s of TB and processes 100,000s of queries per second – a clear demonstration that SQL scales and that SQL features are critical to satisfying business users.
The headline feature of Couchbase 4.0, long a vocal proponent of NoSQL, is a SQL interface. Couchbase brands its version of SQL “N1QL.” But N1QL is nearly indistinguishable from SQL. Couchbase touts SQL benefits alongside its NoSQL messaging: N1QL enables you to query a document database without limitations and without compromise – sort, filter, transform, group, and combine data with a query. That’s right. You can combine data from multiple documents with a JOIN. That flexible data model you were promised? This is it. You’re no longer limited to “single table” and “table per query” data models. (www.couchbase.com). The strawman that users were ever limited to single table and table per query models is certainly frustrating to vendors that have seen and embraced the SQL requirement from day one – but the sentiment is certainly easy to understand.
Apache Cassandra is another prominent “NoSQL” system that has been pushed to deliver SQL as the primary database interface. The project brands its flavor of SQL “CQL” but goes to considerable length to explain that CQL is really just SQL. “CQL v3 offers a model very close to SQL in the sense that data is put in tables containing rows of columns. For that reason, when used in this document, these terms (tables, rows and columns) have the same definition than (sic) they have in SQL.” (https://cassandra.apache.org/doc/cql3/CQL.html#Preamble)
Google first birthed the MapReduce model into the marketplace. Behind the scenes the company quickly realized the shortcomings of MapReduce as a query model and built a large-scale analytic SQL-based system named “Dremel.” Dremel maps a SQL query model onto a document-based storage model at massive scale. Again, Google proves that SQL scales across volume, velocity and problem space. The Hadoop ecosystem has taken up the path, launching multiple SQL-on-Hadoop projects, including Impala, Presto, SparkSQL and Drill.
Why SQL? Data management systems must make data interactive. Interactivity is core to deriving value from the data that is being stored. All of the critical design choices made by database designers are motivated by interactivity. Vendors that put other priorities first are quickly realizing the corner they’ve backed into and are grafting SQL query processors into their core products.
If you have to support queries, why use SQL?
SQL is a Standard: It’s easy to find people who know SQL. It’s easy to connect to standard tools. A multitude of resources are available to learn SQL. Bonus: your editor already has syntax highlighting.
SQL is Declarative: With SQL, the query is written by specifying the form of the results declaratively. It is the job of the database software to understand the best way to access data, operate on it, and turn it into results. Declarative queries insulate the query-author from the underlying physical schema of the data. Compared to non-declarative processing, applications are much less fragile, and tolerate changes to the underlying schema, such as added columns or indexes, with no changes to the query itself.
SQL Scales: “SQL doesn’t scale” was oft cited as a key reason NoSQL exploded. Also frequently heard is, “To solve internet-scale problems, you need to abandon SQL.” Today, both Facebook and Google have publicly sung the praises of their SQL systems. Many NoSQL stores have added SQL or SQL-like query languages without crippling performance. New SQL systems such as VoltDB have shown that millions of SQL operations per second are within the reach of anyone with an Amazon Web Services (AWS) account.
SQL is Flexible: While there are a number of SQL standards, vendors and open-source projects have liberally extended SQL. VoltDB supports UPSERT functionality, JSON extensions and other non-standard SQL our customers have requested, all while supporting all of the standard SQL operations familiar to developers.
In sum, SQL is a proven technology. It is the easiest way to write queries. It is the most familiar way to write queries. It is the most compatible way to write queries. It is the most powerful way to write queries. The ability to query data is at the heart of the database value proposition.
Ryan Betts is CTO at VoltDB. He was one of the initial developers of VoltDB’s commercial product, and values his unique opportunity to closely collaborate with customers, partners and prospects to understand their data management needs and help them to realize the business value of VoltDB and related technologies.
Prior to joining VoltDB in 2008, Ryan was a software engineer at IBM. During a four-and-a-half year tenure, he was responsible for implementing device configuration and monitoring as well as Web service management. Before IBM, Ryan was a software engineer at Lucent Technologies for five years. In that role, he played an integral part in the implementation of an automation framework for acceptance testing of Frame Relay, ATM and IP services, as well as a high-availability upgrade capability and several internal components related to device provisioning. Ryan is an alumnus of Worcester Polytechnic Institute.
(image credit: Ishrona)