Brian Gentile is Sr. Vice President & General Manager of the TIBCO Analytics Product Group within TIBCO Software. Formerly, he was Chairman and Chief Executive Officer of Jaspersoft, which was acquired by TIBCO in April 2014.
The modern demands of a new data velocity spectrum are becoming clearer each month, as more organizations recognize the newfound responsibility of using data to create more value.
There is more data available today than ever to put to work driving sales, saving costs, creating loyal customers or greater efficiency – in short, building more value. To put more data to work, an organization needs a plan. This plan should be based on deep knowledge of the needed business uses of the data. Mostly, an organization should keenly understand how quickly it must put data to work in order to create value. Enter the New Data Velocity Spectrum, which recognizes that some data needs to be put to work instantly while other data needs curation time, rich definition and dimension in order to unlock its value.
I believe that, during 2015, an understanding of this New Data Velocity Spectrum will become mainstream. Further, during this next year, even more technological options will exist for an organization of any size to be successful at any point across this spectrum.
How Fast Does the Data Need to be Put to Work?
If the primary goal is speed of putting the data to work, because the data would become stale otherwise, the primary value comes from “transaction-ality” of the data – and the information architecture should reflect that need for speed. If the primary goal is richness of the data so that they can be defined broadly and used flexibly across a variety of analytic uses, the primary value comes from the “dimension-ality” of the data and the information architecture should reflect that need for richness and definition.
In the past, we tried to build analytic information systems with portions of this spectrum in mind . . . at least the portion we could solve for with readily available technology. And, in the past, even the best practice uses of these technologies involved clear trade-offs. Often, rich, multi-dimensional query environments were constructed completely off-line and with data that had been substantially transformed (think ETL, if you’re a data warehouser). Or, real-time data feeds, probably using Enterprise Service Bus technology, populated special-purpose operational dashboards that enabled a variety of decisions and with low latency. In either case, wherever persistence was needed, the underlying data storage workhorse has been the relational database management system (RDBMS).
For the past twenty-five years, RDBMS technology has broadly served the available spectrum of data velocity, acting as the underlying data construct for practically every transactional and analytic use. As the need for rich data increases (dimensionality), data warehouses technologies were built on top of the classic RDBMS, imposing new indexing systems, stored procedures and optimized query processors. The ultimate thirst for dimension-ality has been historically quenched with an OLAP engine, which has also been popularly built atop the standard RDBMS, becoming “ROLAP” (with the obvious and popular exception of Essbase). So, for nearly three decades, the answer has been an RDBMS. Now, what was the question?
The Most Modern Technologies
Of course, our reliance on this single relational database engine has lessened recently as we now have a much wider set of fast-growing alternatives, each better suited to a portion of the Data Velocity Spectrum now necessary in so many organizations.
To better address our need for “transaction-ality” or the speed of data, modern streaming technologies are now commonly supplementing modern database tools, delivering constant, immediate access to data feeds from any data source or type. Technologies such as Apache Storm, Amazon’s AWS Kinesis, and TIBCO’s Streambase provide immediate access and processing of data feeds from nearly any data source or type. Today, streaming data feeds power both transactional and analytic uses of data, enabling (for example) relevant rules to be established where real-time results trigger insight that leads to fraud detection, security monitoring, service route optimization, and trade clearing all around the world.
Further, our one database choice has exploded into many, as a variety of NoSQL data stores are emerging successfully. From key value stores (Redis, Cassandra) and document-oriented databases (MongoDB, CouchDB), to Big Table structures (HBase), Graph Databases (Neo4J) and in-memory database caches and engines (TIBCO ActiveSpaces, Gemfire) – the choices are many, powerful and can be complex. But, choice is good because we’re better serving the specialty business needs across this New Data Velocity Spectrum.
To better address our need for “dimension-ality” and recognize the need for so many new, high-volume, multi-structured data types, in many ways Hadoop has become the new Data Warehouse, complete with its array of sub-processing components. All the while. the combination of a massively parallel analytic database (Vertica, Netezza, Greenplum) and a modern, in-memory-based business analytics platform (TIBCO Jaspersoft, TIBCO Spotfire) now often replace most of the functionality of traditional OLAP technologies at a fraction of the time and cost. Solving for rich definition and dimensions in data has never been easier or less expensive.
It is a bold, new world. The winners are both the business users and the technology professionals because modern, special purpose technologies are fast arriving to address the growing needs across The New Data Velocity Spectrum. In 2015, matching the right information architecture and the best available technology to the specific business need on this Spectrum will become commonplace. Just in time to compete more successfully.
Figure 1: The Data Velocity Spectrum. How fast does the data need to be put to work?