Several big announcements came out of the Google I/O Conference in San Francisco yesterday, one of the more significant of which was Google’s decision to do away with MapReduce, the system which originally inspired Hadoop. Google apparently stopped relying on MapReduce “years ago”, according to Urs Hölzle, Google’s Senior VP of Technical Infrastructure. Their replacement solution is a newly-announced cloud analytics platform known as Cloud Dataflow.

“Cloud Dataflow is the result of over a decade of experience in analytics,” Hölzle said. “It will run faster and scale better than pretty much any other system out there.” The need for a new system came about, he explained, when MapReduce became unable to handle to amount of data Google has to work with. Apparently performance begins to decline dramatically once your dataset exceeds a couple of petabytes in size.

Cloud Dataflow, in contrast, is much more scalable. It also takes out alot of the legwork in creating complex pipelines, as deployment, management and scaling are automated. Some other data management products in Google’s roster of announcements include:

  • Cloud Save- An API which automatically saves user’s data so it can be used without any coding from the server. It will be a feature of Google’s App Engine and Compute Engine.
  • Cloud Debugging- Makes the process of rummaging through lines of code to identify bugs easier
  • Cloud Tracing- Provides latency statistics and analysis
  • Cloud Monitoring- An intelligent monitoring system which integrates Stackdriver (a startup Google recently acquired). It monitors infrastructural resources like disks and virtual machines, as well as a dozen open-source packages outside the Google product line

It will be interesting to see if Cloud Dataflow really is faster and more scalable than any other system, and what the announcement will do to public confidence in MapReduce, and also Apache Hadoop.
Read more here.
(Image credit: Flickr)

Interested in more content like this? Sign up to our newsletter, and you wont miss a thing!



Previous post

Understanding Big Data: Machine Learning

Next post

Cambridge-Based Big Data Analytics Startup Raises $6.25M