Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

What Is Google Cloud Dataflow?

byEileen McNulty
August 8, 2014
in Articles
Home Resources Articles
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

To say that the cloud computing market was exploding would be an understatement. In July, we heard multiple reports supporting the proclamation of cloud as the next revolution in the computing industry. The IDC claimed the cloud computing market at the close of the year would be worth $4 billion in EMEA. In the UK, 78% of organisations have “formally” adopted one or more cloud-based services. Fujitsu recently announced they’ve set aside $2 billion to expand their cloud portfolio. Evidently, the cloud is big business.

The market continues to be dominated by Amazon Web Services, with Microsoft and IBM making serious inroads. But there’s one industry giant missing from this list: Google. In Q2, Microsoft’s cloud infrastructure revenue grew by 164%; Google lagged at only 47%. But Google have a secret weapon in their cloud portfolio, whose release may sky-rocket their market share- Google Cloud Dataflow.

What is Google Cloud Dataflow?

What is Google Cloud Dataflow 2

Cloud Dataflow keynote slide; source
In short, Cloud Dataflow allows you to build pipelines, monitor their execution, and transform & analyse data, all in the cloud. In a “sneak peek” blogpost, Google stated Cloud Dataflow will allows you to gain “actionable insights from your data while lowering operational costs without the hassles of deploying, maintaining or scaling infrastructure.” It’s still currently in private beta, but here’s an overview of what we know so far:

  • It’s multifunctional- As a generalisation, most database technologies have one speciality, like batch processing or lightning-fast analytics. Google Cloud Dataflow counts ETL, batch processing and streaming real-time analytics amongst its capabilities.
  • It aims to address the performance issues of MapReduce when building pipelines- Google was the first to develop MapReduce, and the function has since become a core component of Hadoop. Cloud Dataflow has now largely replaced MapReduce at Google, which the company apparently stopped using “years ago”, according to Urs Hölzle, Google’s Senior VP of Technical Infrastructure.
  • It’s good with big data- Hölzle stated that MapReduce performance started to sharply decline when handling multipetabyte datasets. Cloud Dataflow apparently offers much better performance on large datasets.
  • The coding model is pretty straightforward- The Google blog post describes the underlying service as “language-agnostic”, but the first SDK is for Java. All datasets are represented in PCollections (“parallel collections”). It includes a “rich” library of PTransforms (parallel transforms), including ParDo (similar to Map and Reduce functions and WHERE in SQL), and GroupByKey (similar to the shuffle step of MapReduce and GROUPBY and JOIN in SQL). A starter set of these transforms can be used out of the box, including Top, Count and Mean.
  •  It “evolved” from Flume and Millwheel- Flume lets you develop and run parallel pipelines for data processing. Millwheel allows you to build low-latency data-processing applications.

What Does Cloud Dataflow Mean For Existing Google Cloud Customers?

What is Google Cloud Dataflow

Google Cloud Dataflow keynote slide; source

Dataflow is designed to complement the rest of Google’s existing cloud portfolio. If you’re already using Google BigQuery, Dataflow will allow you to clean, prep and filter your data before it gets written to BigQuery. Dataflow can also be used to read from BigQuery if you want to join your BigQuery data with other sources. This can also be written back to BigQuery.

Are Google the Only Major Players Tapping into Data Flow?


Facebook have already developed a data flow architecture called Flux. The video above, explaining Facebook’s data flow project Flux, is a pretty good example of a data flow architecture, and demonstrates theirs at work within the Facebook messaging system. As the video explains, Flux “avoids cascading affects by preventing nested updates”- simply put, Flux has a single directional data flow, meaning additional actions aren’t triggered until the data layer has completely finished processing.

FlumeJava, from which Cloud Dataflow evolved, is also involved the process of creating easy-to-use, efficient parallel pipelines. At Flume’s core are “a couple of classes that represent immutable parallel collections, each supporting a modest number of operations for processing them in parallel. Parallel collections and their operations present a simple, high-level, uniform abstraction over different data representations and execution strategies.”

Many see Cloud Dataflow as a competitor to Kinesis, a managed service designed for real-time data streaming developed by industry leaders Amazon Web Services. Kinesis allows you to write applications for processing data in real-time, and works in conjunction with other AWS products such as Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, or Amazon Redshift.

Does Google Cloud Dataflow Mean the Death of Hadoop and MapReduce?

Since Cloud Dataflow is being used in place of MapReduce in the Google offices, and Google have marketed Cloud Dataflow as having “evolved” from MapReduce, many have been proclaiming the death of MapReduce, and also Hadoop, of which MapReduce is the core component.

On the subject, Ovum analyst Tony Baer told InfoWorld Cloud Dataflow forms “part of an overriding trend where we are seeing an explosion of different frameworks and approaches for dissecting and analyzing big data. Where once big data processing was practically synonymous with MapReduce, you are now seeing frameworks like Spark, Storm, Giraph, and others providing alternatives that allow you to select the approach that is right for the analytic problem.”

It is true MapReduce use in the decline. But that’s why Hadoop 2.0 introduced YARN, which allows you to circumvent MapReduce and run multiple other applications in Hadoop which all share common cluster management. One application that’s gained considerable attention is Spark; as InfoWorld states, which can perform map and reduce in-memory, making it much faster than MapReduce. Of course, such applications can run on top of Hadoop, so whilst there are now many different approaches to MapReduce, it doesn’t mean Hadoop is dead. Current Hadoop users have all of their data stored on-premise, and it’s unlikely that a considerable number of these users are going to migrate all of their data to the cloud to use Cloud Dataflow. In short: Hadoop is safe for now.

This post will be updated as and when further updates about Cloud Dataflow are announced, to give you an up-to-date guide on advancements ahead of its release.

Follow @DataconomyMedia


Eileen McNulty-Holmes – Editor

1069171_10151498260206906_1602723926_n

Eileen has five years’ experience in journalism and editing for a range of online publications. She has a degree in English Literature from the University of Exeter, and is particularly interested in big data’s application in humanities. She is a native of Shropshire, United Kingdom.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Email: [email protected]


Tags: awscloudFlumeGooglegoogle cloud dataflowWeekly Newsletter

Related Posts

How automation tools are being integrated into professional networking

How automation tools are being integrated into professional networking

May 31, 2026
Autonomous agentic UI orchestration for high-throughput enterprise ecosystems

Autonomous agentic UI orchestration for high-throughput enterprise ecosystems

May 31, 2026
Freedom Holding Corp.: Competing through data and integration

Freedom Holding Corp.: Competing through data and integration

May 15, 2026
First Round Capital’s Network Shows Where Seed Capital Is Landing

First Round Capital’s Network Shows Where Seed Capital Is Landing

May 5, 2026
The silence in the machine: Reclaiming authority in the age of digital noise

The silence in the machine: Reclaiming authority in the age of digital noise

April 22, 2026
Synthetic Data Alone Cannot Train Physical AI to Handle the Real World

Synthetic Data Alone Cannot Train Physical AI to Handle the Real World

April 17, 2026
Please login to join discussion

LATEST NEWS

Why Telegram Mini Apps have become the optimal ecosystem for launching AI SaaS products

Crypto investors are watching one date closely in 2026

How Telegram Creators test post visibility before running growth campaigns

Does your AI clock in without you?

Why secure software delivery depends on better release management

Sony reveals God of War: Laufey for PS5

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Veed.io

Paper Pilot

IsOn24

Magnific

DADABOTS

Rosebud AI

Prome

Pageon AI

Vyond

Centauri AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.