Scale Up is the New Scale Out

Scale up or scale out? As we develop better tools and strategies for treating the whole data center- real or virtual- as a “server,” the answer seems obvious: it’s all about scaling out. But that’s the big picture. While we’re scaling up, to more and more boxes, inside the box the server is becoming a miniature data center. And that change is bringing an esoteric set of programming challenges to more and more data store developers.

It’s not a question of “scale up” or “scale out” any more. Like most either/or questions in IT, the answer is: neither, but also both. “Scale up” is just “scale out” in a box.

From an IT operations point of view, it’s all good news. You can plan further in advance without knowing the specifics of what hardware you’re likely to use. Will you find it less expensive to acquire more, cheaper nodes (physical or virtual) or fewer, larger nodes as your project grows? Better not to have to worry about it early on. The extra work, though, falls on the programmers creating the next generation of scalable data stores.

First-generation scaling

When the industry began the first move from scale up to scale out, we did it in the obvious way. The individual nodes that form today’s first generation of scalable data stores are designed like old-school multi-threaded software. Complexity of scaling exists at the data center level, but within the node, the software design of today’s NoSQL servers would be familiar to a developer of the 1980s: classic, old-school threads.

MongoDB is an excellent example. At the big picture level, it’s scalable across many nodes. But dig into the documentation that covers individual nodes, and you find that on each individual node, it’s a conventional multi-threaded program. What’s wrong with that? Nothing, back in the day, when CPU cores were few. But classic multi-threaded programming requires locking in order to enable communication among threads. Threads carry out tasks and must share data. In order to protect the consistency of this shared data, the developers of multi-threaded server software have developed a variety of locking methods. When one thread is modifying data, other threads are locked out.

Multi-threaded programming works fine at small scales. But as the number of cores grows, the amount of CPU time spent on managing locks can outgrow the time spent on real work. Today’s NoSQL clusters can grow, because they’re designed for it- but the individual nodes can’t. Inside the node, we’re facing a version of the same problem that we solve by going from a master RDBMS with a replica, up to a resilient multi-master NoSQL system.

Next-generation scaling: new servers are like little data centers

The hardware on which the next generation of NoSQL systems will run has some important differences from the hardware that developers have been used to. Our software and mental tools are based on some assumptions that no longer hold. For example, not only are numbers of cores going up, but non-uniform memory access (NUMA) is the standard design for server hardware. To a conventionally written program, the OS hides the hardware design, and memory is memory from the data store point of view. But secret performance penalties lurk when you try to access memory that’s hooked up to another CPU.

Fortunately, old-school threading and locking isn’t the only programming paradigm we have available. Advanced computer science research can do for the individual node what NoSQL systems do for the data center. “Shared-nothing” designs allow all those multitudes of cores to run, doing real work, without waiting for locks. All shared data exchanged between cores must be explicitly passed as a message. The problem, though, is making new designs workable for real projects. Developers need to be able to reason about data flow, to write tests, and to troubleshoot errors. As we move from threads and locks to more efficient constructs, how will we be able to cope?

An example of a way to do it is Seastar, a new, open-source development framework designed specifically for high-performance workloads on multicore. Seastar is built around “scale out” within the server, so that as server capacity grows, projects will be able to “scale up.” Seastar reinvents server-side programming around several important concepts:

Shared-nothing design: Seastar uses a shared-nothing model to minimize coordination costs across cores.
High-performance networking: Seastar uses DPDK for fast user-space networking, without the complexity of the full-featured kernel network stack.
Futures and promises: programmers can obtain both high performance and the ability to create testable, debuggable, high-quality code.

The next generation of NoSQL will require a foundation that works as well for increasing complexity of workloads on the server as the first generation worked for increasing numbers of nodes in the data center. The kind of development tasks that elite high-performance computing projects have used is now, thanks to multicore, going to be a basic tool for production data stores where quality, maintainability, and extensibility are as important as performance.

Don Marti is Technical Marketing Manager for Cloudius Systems. Cloudius is a young and restless startup company that develop OSv: the next generation cloud operating system. Cloudius Systems is an open source centric company, led by the originators of the KVM hypervisors and employ superstar virtualization and OS veterans.