The rise of Big Data, and the industry’s IoT craze, are driving huge demand for streaming data analytics. There’s an impediment though: streaming data is hard to work with. 2016 will heighten the demand, and also the tension around the difficulty. It may also force a solution.
In the big-data era, businesses yearn to be data-driven in their decision making processes. Rather than act on hunch, they’d prefer to observe and measure, then make decisions accordingly. That’s a laudable standard – but it raises the bar. A culture driven by data, in turn, drives a desire for real-time, streaming data.
And if culture didn’t drive that desire, technology trends would. Analyzing Web logs in real-time can help drive multi-channel marketing initiatives with much more immediate impact. The extremely-hyped IoT – the Internet of Things – is all about the observation of ephemeral sensor readings, the quintessential streaming data. But the value of that data is ephemeral as well, making real-time, streaming analytics a necessity. In the consumer packaged goods world, for example, using sensors to monitor temperature and other physical attributes of manufacturing equipment helps assure things are running smoothly. These readings can also be fed into predictive models to prevent breakdowns before they happen, and keep the assembly lines running.
It’s exciting. The use cases and the demand for streaming are in place, and 2016 is poised to be the year when streaming analytics crosses over from niche to mainstream.
Quid pro quo?
The rewards of being able to analyze data in real-time are huge, but the effort often has been as well. Working with streaming data isn’t like working with data at rest. It involves a paradigm shift, a different skill set, a different outlook, a change of mindset.
To understand why, we need only rewind a bit, to real-time data technology before the Big Data era. That category of software, known as complex event processing, or CEP, set the precedent for streaming data and difficulty going hand-in-hand.
Back then, data was smaller and accepted latencies were higher. That meant CEP was niche, allowing it to be difficult, expensive and inconsistent with other data technology.
The schism
Querying data at rest involves an architecture and approach that has been with us for more than 20 years: find a connector/driver/provider that can talk to your database, feed it a SQL query, and get your results back as a set of rows and columns. This is a pattern with which virtually every technologist is familiar. It’s a universal, standard, shared concept.
But with streaming data, you need to work with data structured as “messages.” Messaging systems work on the premise of “publishing” small bit of data to queues, to which other systems can “subscribe.” They are thus often referred to as pub/sub message buses. Message buses don’t work like databases. And queues, publishers and subscribers don’t work like tables, schemas and client drivers. The mechanics are all different. And without fighting over which model is better, the fact is that the message model is completely orthogonal to the SQL query/driver/database one that’s been around for so long. And orthogonality inhibits adoption.
An imperfect unification
It gets worse. Because, despite the distinct models for working with streaming data and data at rest – things work best when both types of data can be used together. Cross-referencing one with the other adds value to each, and the whole is proverbially greater than the sum of the parts. A model, based on blending the two and called the “Lambda Architecture,” is gathering steam as well.
But Lambda is premised on accepting the very mismatch between working with streaming and non-streaming data. Lambda has caved to the accident of CEP’s history: that queues and messages are structures that consumers of streaming data must explicitly contend with. Giving in to that demand makes things difficult, and while one might be tempted into a no-pain-no-gain outlook on this, the reality is it doesn’t need to be this way.
Will it blend?
The “physics” of streaming and conventional data are different. But that doesn’t mean they can’t be looked at through the same lens. The way we work with data at rest can, and should, be used as a metaphor – an abstraction layer – for the way we work with streaming data. We can still have database drivers. And IoT devices can present an interface that makes their data streams look like tables, based on a moving window of time.
Doing so would allow conventional BI and data discovery tools to talk to streaming data sources without being massively re-tooled. That’s because they’d be “fooled” into thinking the streams were conventional databases. Eventually, these tools wouldn’t be fooled anymore, and they’d be updated to accommodate streaming concepts, like the length of the time window, and maybe the type of IoT device and the protocol it used.
A more enlightened path
But evolutionary improvements are very different, and much better, than requiring a completely separate set of tools, and a context switch, every time analytics shifts from data at rest to streaming data. This unification of streaming and non-streaming should begin to materialize this year. Customer demand and customers’ constraints will combine into a forcing function.
When the two data genres come together, the real power of streaming and IoT will click into place. Streaming data will come to the customer, instead of the other way around. Friction will be eliminated, and casual experimentation with streaming data will begin.
Such facilitation is prerequisite to any data technology’s broad adoption. This is already happening and will continue to, and all analytics will benefit, as will those conducting the analyses.
image credit: Marcin Ignac
Like this article? Subscribe to our weekly newsletter to never miss out!