Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

If you care about Big Data, you care about Stream Processing

by Arpan Shah
March 13, 2017
in Data Science, Data Science 101, Understanding Big Data
Home Topics Data Science
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

As the scale of data grows across organizations with terabytes and petabytes coming into systems every day, running ad hoc queries across the entire dataset to generate important metrics and intelligence is no longer feasible. Once the quantum of data crosses a threshold, even simple questions such as what is the distribution of request latencies becomes infuriatingly slow with the usual sql and database model. Imagine running such a request on the Facebook request data, the query would take days to complete.

Stream processing offers the solution for anyone looking to manage an ever increasing volume of data.

What is stream processing

Stream processing is the handling of units of data on a record-by-record basis or over sliding time windows. This methodology limits the amount of data processed and offers a very contrasting way of looking at the data and the analytics. Complexity is traded in for speed. The nature of the analytics with stream processing are usually filtering, correlations, sampling and aggregations over the data.

Examples of the diverse applications of stream processing include:

  • Alerting on sensor data on IoT devices
  • Log analysis and statistics on web traffic
  • Risk analysis with movements of money and orders in Fintech
  • Click stream analytics on Ad Networks

 

Comparison to Batch Processing:

Batch processing Stream processing
Scope Queries over all or most of the data in the dataset. Queries or processing over data within a rolling time window, or on just the most recent data record.
Size Large numbers of records Individual records or micro batches consisting of a few records.
Performance Latencies in minutes to hours. Requires latency in the order of seconds or milliseconds.
Analyses Complex analytics. Simple response functions, aggregates, and rolling metrics.


Be truly real-time

Stream processing is the only way applications can truly be real time with their data. As soon as we talk about aggregations over the universe of data, any batch process job will continue to take longer as the amount of data increases. If you have an application that relies on real time metrics, without a stream processing solution, your view of the world is always going to be delayed.

Flow data over queries

To harness the power of stream processing, engineers and data scientists need to evolve their model from one where they run queries over their data to one where the data runs over the queries. This is a powerful shift in the way to look at the data applications.

The key shift is from tables to streams and doing operations on those streams such as

  • filtering
  • joining
  • aggregating
  • windowing

Lets compare the contrast in the batch processing world vs the stream processing world in the example application of Calculating the distribution of number of user sessions per user in a day on the website.

In batch processing: a complicated query would define what events to include in a user session, what the maximum delay between events should be for a session and computing a unique count over them. Since events in tables can occur out of order, even if a new table is maintained   every record in the table would need to be traversed for an answer

In stream processing: A stream of events would be simply be filtered to produce a stream of relevant events, this stream would then be windowed over a delay window to produce a stream of unique sessions. Finally this stream would be aggregated upon for unique counts per user with the relevant number simply being appended to a key value store.

As this example demonstrates, a less computationally intense and more real-time answer can be obtained simply be switching the way the data is perceived

Easier than ever before

Managing stream processing applications is not easy with all the moving pieces of managing issues such as fault-tolerance, partitioning and scaling and durability of the data. Over the last 7 years a lot of systems have emerged; many of which are open source and handle a lot of these issues entirely. This leaves the developer to only worry about implementing the application logic.

Some of the most exciting examples of have only launched in the last couple of years: Spark Streaming (2015), Apache Beam (2016) and Kafka Streams (2016). A full comparison of the various popular options are in figure 2.

Figure2

Any person developing applications with big data should keep the models of stream processing their mind as they choose the architecture and stack for their system. While stream processing is not a panacea for all the tasks around large scale data processing, it offers a new and important way to look at data that allows scalable real-time

 

Credits:

Figure 1: Information obtained from AWS Documentation

Figure 2: Image Courtesy: Ian Hellström on Twitter

 

Like this article? Subscribe to our weekly newsletter to never miss out!

Follow @DataconomyMedia

Tags: Big Datadata scienceStream Processing

Related Posts

Meet Photoleap, the AI photo editor that can fulfill almost every user’s needs

Meet Photoleap, the AI photo editor that can fulfill almost every user’s needs

June 9, 2023
How is artificial intelligence in surgery and healthcare changing our lives?

How is artificial intelligence in surgery and healthcare changing our lives?

June 8, 2023
These text-to-video AI tools brings Tarantino to your Reels

These text-to-video AI tools brings Tarantino to your Reels

June 8, 2023
The pursuit of creative general intelligence comes to fruition

The pursuit of creative general intelligence comes to fruition

June 8, 2023
The best anime AI generator you are looking for

The best anime AI generator you are looking for

June 8, 2023
WordPress has its own AI writing assistant now

WordPress has its own AI writing assistant now

June 7, 2023

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

LATEST ARTICLES

Canva not working right now; yes, it is really annoying

Here are the best Midjourney Prompts that will blow your mind

Decoding the secrets of code execution

Tired of unrealistic beauty filters on TikTok? Bad news, the frustration grows

Meet Photoleap, the AI photo editor that can fulfill almost every user’s needs

What are the best things about IT staff augmentation services?

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy
  • Partnership
  • Writers wanted

Follow Us

  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.