Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

‘Streams in the Beginning, Graphs in the End’ — Part I: Data Management for the Internet of Everything

byVenkat Krishnamurthy
June 1, 2015
in Articles, Tech
Home Resources Articles

‘Streams in the Beginning, Graphs in the End’ is a three-part series by Dataconomy contributor and Senior Director of Product Management at Cray, Inc., Venkat Krishnamurthy – focusing on how big changes are afoot in data management, driven by a very different set of use cases around sensor data processing. In this first part, we’ll talk about how the bigger revolution in data management infrastructure is driven more by the increasing ease of data collection than by processing tools. 


For those of you that like natural disaster movies, you may recall Twister, a movie about tornado chasers where the star attractions were the twisters themselves. As a quick plot summary, the movie is about tornado chasers risking life and limb to get a bunch of shiny, winged sensors into the heart of an EF5 twister to enable them to understand these monsters better from the inside. In a way, ‘Dorothy’, the machine that digitized the tornado foretold the arrival of the Big Data age

It’s no exaggeration that we’re in the golden age of data management and analytics.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

To us at Cray, Data has never really been any other scale than Big, and the reason for this has been the scientific method itself. Science begins with observation, and ‘data analytics’ has been fundamental to this endeavor from the beginning. In the past, this led to the invention of specialized instruments to observe the very small (microscopes) or very large (telescopes). Arguably this is really the first application of ‘data analytics ‘- in a sense, a (optical) microscope or telescope simply turns a tissue sample or patch of sky into a stream of photons analyzed by sophisticated pattern recognition engines (human brains) attached to extremely high-fidelity sensors (human eyeballs).

However, as science relentlessly advanced into ever smaller and ever larger scales simultaneously, it became humanly impossible to build equally capable instruments.

Scientists instead turned to creating scalable, high fidelity mathematical models of physical phenomena and needed tools to study them, hence giving rise to supercomputing by necessity. They use these models to study the insides of stars, the structure of the universe, or molecular dynamics. So, supercomputers have evolved primarily driven by the need to approximate reality at extreme scales – and are really versatile, multipurpose scientific instruments in disguise.

Meanwhile, major advances in data processing have been driven primarily by the commercial sector, starting with the birth of the database. Big ideas in data management like the relational model, transaction processing and SQL were birthed in this age of relatively scarce data and compute capabilities when it was too expensive to capture anything other than a carefully curated recording of key business events.

When the inevitable need arose to understand a business beyond just recording it, the central ideas of Data Warehousing and Business Intelligence were born, driven by basic business needs like financial reporting and sales analysis. Hence, the major ideas of data processing were driven primarily by a need to understand reality, albeit in a narrow business-oriented sense.

For a long time, the paths of traditional ‘supercomputing’ and data analytics didn’t quite intersect except in specialized domains like finance. This persisted till Google upended the status quo famously with the Map Reduce processing model in 2004. The motivating problem at Google was to index the entire Web – but by focusing on building a set of simple building blocks and principles for data processing at extreme scale, they set the stage for the Big Data revolution.

The subsequently rapid, exponential evolution of many open frameworks to process data at scale has meant that Big Data has now become a pervasive cliche applied to every domain and several use cases beyond this original need. Tools like Spark and Hadoop allow the average commercial company to dream really Big about their Data, but have brought to the fore all the problems of building and using distributed computing platforms and applications to the commercial datacenter. In addition, businesses are evolving from simple counting and aggregation of business events, to trying to identify sophisticated patterns in their data, inevitably bringing them closer to the computational techniques used in science.

On the flip side, supercomputers have gotten even better at approximating reality, and generate ever-increasing amounts of data in the process. As the supercomputer has become a telescope, or microscope into the un-observably large or small, the ‘stream of photons’ is now a deluge of bits.  Increasingly, scientists need to combine the results of these simulations with data from the real world, and identify patterns in petabyte-sized datasets. Their big data need isn’t gated so much by scale, as by productivity, loosely defined as the quickest time to first result in analyzing the data they have either simulated and/or collected. What is needed is the equivalent of human eyeballs and brains at this scale – this is, in essence why convergence between Supercomputing and Big Data is inevitable.

 

 

'Streams in the Beginning, Graphs in the End' — Part I: Data Management for the Internet of Everything

Fig 1 – The evolution of the microscope – on top, the first ever microscope invented by Anton Leeuwenhoek and some samples. Below, a pictorial representation of mass-spectrometry bio-imaging, which ionizes biological samples into mass spectra

 

Great, you say – but why is ‘Dorothy’ and a barrel of shiny artificial butterflies relevant to this? Also, the idea of ‘convergence of supercomputing and Big Data’ sounds good, it’s still somewhat abstract. How does this all tie together?

The way we see it, the big changes for data management so far have been the ‘revolution at the center’: Storage facilities (‘Data Warehouses’), distribution facilities (‘Data Hubs’) or aquatic bodies of data (‘Data Lakes’).

In contrast, we believe that the realization of the Big Data revolution will be at the edges of data management. Here is where we see this idea of convergence fundamentally becoming reality, and driving changes in everything from the building blocks for large-scale data processing to the system architecture for platforms at Cray that can deliver on the promise.

Why is this true? We believe it has to do with 2 fundamental problems on either end of the analytical data management lifecycle

  • At one end, how to handle data management when data collection is pushing towards the ‘edges’, where a large number of sensors produce data
  • At the other end, how to create a scalable model of knowledge to unify the results from any and all types of data processing of all that sensor data

To address the above, we believe that an important organizing principle for data management of Big Data will be about ‘Streams in the beginning, Graphs in the End’.

In subsequent parts, we’ll dive into greater detail on each of the above. Stay tuned!

Follow @DataconomyMedia

Image Credit: Eric Fischer / Geography of Twitter / CC BY 2.0

Tags: CrayData Managementinternet of thingsIoTsurveillance

Related Posts

Android 16 Pixel bug silences notifications

Android 16 Pixel bug silences notifications

September 1, 2025
Windows 11 OOBE update installs start September 2025

Windows 11 OOBE update installs start September 2025

August 29, 2025
Huawei Mate TV launch set for September 4

Huawei Mate TV launch set for September 4

August 29, 2025
Apple leak teases new Crossbody Strap for iPhone 17

Apple leak teases new Crossbody Strap for iPhone 17

August 29, 2025
WhatsApp beta tests 1-hour disappearing messages

WhatsApp beta tests 1-hour disappearing messages

August 29, 2025
Leak: SM8550 could launch as Snapdragon 8 Elite Gen 5

Leak: SM8550 could launch as Snapdragon 8 Elite Gen 5

August 29, 2025
Please login to join discussion

LATEST NEWS

xAI sues former engineer to stop him from joining OpenAI, alleging theft of Grok trade secrets

Psychopathia Machinalis and the path to “Artificial Sanity”

GPT-4o Mini is fooled by psychology tactics

AI reveals what doctors cannot see in coma patients

Asian banks fight fraud with AI, ISO 20022

Android 16 Pixel bug silences notifications

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.