Big data is overwhelmingly powerful, overwhelmingly omnipresent, and most of all, overwhelming to grasp. As such, three factors, three big data waves are currently rolling into our ports. Big data is to be generated, gathered, and aggregated, accumulated, produced, analyzed, and understood at ever increasing speeds. With the ‘internet of things’ and the information we are creating increasing at a logarithmic scale, the possibilities are unprecedented. While before, data was created at a certain pace, we are now generating information – whether it be from phone call metadata, intelligent refrigerators, or the GPS navigation system in our cars – at an ever increasing speed. So fast, in fact, that there is too much information already. Too much information that we have available, but cannot, yet, turn into meaningful, useful intelligence.
Often the data that is collected and enters a system is handled in such a way that only a small percentile of the available knowledge is extracted from it, since only a certain amount of the data can be trawled through. There can not be enough time to make all the connections within the vast oceans of data that are at our fingertips at any given moment. We need to find new and innovative ways to locate the unknown unknowns, and this is a growing task.
But how did we get here in the first place? To get to the point where big data is such a burgeoning field required time and evolution of technology until big data emerged. Big data, characterized simplistically through its massive volumes, velocities, and varieties of data, has been a long time coming, and with each step along the way, the processes involved, i.e. the big data waves, have become more complex and systems more intricate.
1st wave – Managing data structures
Over the past half-century many new factors have unified to necessitate new waves of innovation in data management. There has been massive progress and creation in technology that has brought us to this point in ‘data evolution’. Evolution of data, as any evolution, is besought by problems and a painstaking process of eking out small advances only to encounter new problems. But it is out of these problems that new answers can emerge, to questions previously unthought of. The tug and pull between the questions and their future solutions is what caused the first of the big data waves: managing data structures.
2nd wave – Managing the web and its content
With the onset of storing and the wish to understand data – largely unstructured – came the realization that attacking the information head-on was ineffective and would not yield the desired results despite laborious efforts. Over time, it became apparent that structure was helpful and, above all, necessary. With an attitude of “We want answers! We want knowledge!”, further incentive was provided to find a work-around to expensive storage and slow access problems. Long term storage of data to reveal changing tendencies over time was sought and with progress made over the years, making new connections became easier.
Finding out how things are intertwined was another step towards our current position but, as so often the case, limits on handling the vastness of data became an obstacle to businesses functioning optimally. Daily or weekly inputs into data warehouses were suddenly non frequent of fast enough for real-time business transactions. A novel handle on the problem of decoding and extracting information from unstructured data was necessary. This was found in lumping chunks of information together in an addressable way which made these data sets simpler to handle…which leads us to the second of the big data waves: managing the web and its content.
3rd wave – Big Data Management
With the explosion of the omnipresence of the web in everyone’s daily lives beginning in the nineties, there was a shift towards holding onto and understanding even more varied unstructured data than before in the form of audio and visual materials. With this shift came a next step for the market that unified these components and offered, again, new ways to look at the information. Metadata emerged as well, permitting insights into the structure and form of the stored data itself for the first time. But the challenges did not end here. With the rapid acceleration of the proliferation of computing across the globe, data has diversified and multiplied – and multiplied and multiplied some more. Now there is more data than ever before, which comes in all sorts of different forms, and it needs to be handled fast, fast, and faster. This brings us neatly to the third wave: big data management.
The culmination of years of evolution in data management and building on the preceding waves’ advances, we now have reached the most recent tipping point. Once costly storage and analysis methods have been reduced down to the point at which it becomes not only feasible but comprehensively possible to transcend the limitations placed on assessing all the information stored and finally drawing deeper from the wealth of knowledge available through all the data on hand. Suddenly, new insights into the patterns hiding in complex and massive amounts of data are so close within reach they are just waiting to be uncovered. Regardless of the vastness of the input size, we now not only want our answer to question as yet unfamiliar to us, but we are receiving them as well. What remains to be seen is if the truth of the results trawled from these heaps of data will accurately correspond and carry over into the real-world where their respective results are applied – the ultimate test, yet.
Image Credit : Jason Ralston