Today's Big Data Is Not Yesterday's Big Data

The on-going Big Data media hype stirs up a lot of passionate voices. There are naysayers (“it is nothing new“), doomsayers (“it will disrupt everything”), and soothsayers (e.g., Predictive Analytics experts). The naysayers are most bothersome, in my humble opinion. (Note: I amnot talking about skeptics, whom we definitely and desperately need during any period of maximized hype!)

We frequently encounter statements of the “naysayer” variety that tell us that even the ancient Romans had big data. Okay, I understand that such statements logically follow from one of the standard definitions of big data: data sets that are larger, more complex, and generated more rapidly than your current resources (computational, data management, analytic, and/or human) can handle — whose characteristics correspond to the 3 V’s of Big Data. This definition of Big Data could be used to describe my first discoveries in a dictionary or my first encounters with an encyclopedia. But those “data sets” are hardly “Big Data” — they are universally accessible, easily searchable, and completely “manageable” by their handlers. Conversely, in today’s big data tsunami, each one of us generates insurmountable collections of data on our own. In addition, the correlations, associations, and links between each person’s digital footprint and all other persons’ digital footprints correspond to an exponential (actually, combinatorial) explosion in additional data products.

Nevertheless, despite all of these clear signs that today’s big data environment is something radically new, that doesn’t stop the naysayers. With the above standard definition of big data in their quiver, the naysayers are fond of shooting arrows through all of the discussions that would otherwise suggest that big data are changing society, business, science, media, government, retail, medicine, cyber-anything, etc. I believe that this naysayer type of conversation is unproductive, unhelpful, and unscientific. The volume, complexity, and speed of data today are vastly different from anything that we have ever previously experienced, and those facts will be even more emphatic next year, and even more so the following year, and so on. In every sector of life, business, and government, the data sets are becoming increasingly off-scale and exponentially unmanageable. The 2011 McKinsey report “Big Data: The Next Frontier for Innovation, Competition, and Produc…“ made this abundantly clear. When the Internet of Things and machine-to-machine applications really become established, then the big data V’s of today will seem like child’s play.

In an attempt to illustrate the enormity of scale of today’s (and tomorrow’s) big data, I have discussed the exponential explosion of data in my TedX talk “Big Data, small world“ (e.g., you can fast-forward to my comments on this topic starting approximately at time 8:50 in the video). You can also read more about this topic in the article “Big Data: Compound Interest Growth on Steroids“, where I have elaborated on the compound growth rate of big data — the numbers will blow your mind, and they should blow away the naysayers’ arguments.

(image credit: greeblie)

Kirk is a data scientist, top big data influencer and professor of astrophysics and computational science at George Mason University.He spent nearly 20 years supporting NASA projects, including NASA’s Hubble Space Telescope as Data Archive Project Scientist, NASA’s Astronomy Data Center, and NASA’s Space Science Data Operations Office. He has extensive experience in large scientific databases and information systems, including expertise in scientific data mining. He is currently working on the design and development of the proposed Large Synoptic Survey Telescope (LSST), for which he is contributing in the areas of science data management, informatics and statistical science research, galaxies research, and education and public outreach.