In our cartographic overview of the big data ecosystem, we stated a Big Data environment should allow you to store, process, analyse and visualise data. Thus far in the “Understanding Big Data” series, we’ve been breaking down the ecosystem into composite parts, focusing on software which specifically focuses one or two of these objectives. In this edition, we’ll be examining the big data offerings of household names and global leaders, most of whom offer an end-to-end solution for storing, processing, analysing and visualising data.
Table of Contents
As part of its range of cloud computing products and services, Google offers a big data analysis solution, Google BigQuery. The “big” in the title isn’t misleading- they allow you to process the first terabyte of data for free. The queries are run in an SQL-like language, and can be run on a browser tool, command-line tool or through the BigQuery REST API using a variety of client libraries such as Java, PHP or Python. It’s integrated with a variety of third-party analytics and visualisation tools, including Tableau, Jaspersoft and Qlikview, as well as cloud connectors for services such as Talend, Informatica and Pervasive. We recently reported on how Google developers were using BigQuery to map the notability gender gap in Freebase, which you can read here.
Architecture of SAP HANA; source
SAP’s big data product line revolves around SAP HANA. When speaking with Dataconomy recently about their partnership with SAP HANA, Birst’s VP of Product Strategy said: “when you get to really large datasets, it can have response times that business users are not willing to wait for. What HANA represents is a world class, in-memory database.”
Speed is what HANA does best, claiming speeds between 10,00 and 100,000 times faster than your current data platform. As well as lightning-fast speeds, SAP HANA can be integrated with Hadoop and SAP IQ, the company’s column-oriented, grid-based, massively parallel processing database. SAP HANA users include the NBA, ebay, P&G, Lenovo, ebay and Pacific Drilling.
SAP also offer a range of products for analytics, visualsation, text analytics and business intelligence, as well as applications specifically geared towards fraud detection, customer intelligence and equiment operations.
IBM has a whole portfolio of products for big data mangement. The core of this portfolio is IBM Infosphere, a solution covering data integration, data warehousing, master data management, big data and information governance. This includes Infosphere Stream, a stream computing solution for real-time analytics, which can handle very high throughput, some of the components of which are open sourced. They also offer InfoSphere Big Insights, which uses Hadoop as a basis for the processing of vast amounts of structured and unstructured data.
They also offer the IBM Watson Explorer, fuelled by the technology of IBM Watson, which offers search, navigation and discovery of data sources. On the analytics side of things, they offer the IBM Smart Analytics System, and end-to-end analytics solution. Last week, they also announced they were taking their Big Data technology to the cloud with IBM Navigation on Cloud, which you can read more about here.
Microsoft Azure demo; source
Microsoft’s big data offering is its Azure platform. The Azure portfolio includes HDInsight, a 100% Hadoop-based service in the cloud. It’s run on a pay-for-what-you-use basis, allows you to develop in Java and .NET and visualise using Microsoft Excel. They also offer Azure SQL database, a scalable, self-managed relational database-as-a-service, used by customers such as Samsung and Easyjet. Azure’s storage facility is durable cloud storage, which is offers several solutions for integrating existing data and works with unstructured text or binary data such as video, audio and images. Last month they also announced Azure ML, a machine learning component which will allow users to build big data-based apps and APIs to predict future events.
Amazon Web Services offers solutions for every stage of big data management. The allow you to
- Collect- AWS Direct Connect and their Import/Export service allow you to move data in and out of the cloud quickly. Inbound data traffic is free.
- Stream- Amazon Kinesis is their real-time big data streaming solution
- Store- Amazon Simple Storage Service (S3) is pay-for-what-you-use cloud storage
- Process- They have NoSQL (Amazon DynamoDB), RDBMS (Amazon RDS) and Hadoop (Amazon Elastic MapReduce) offerings.
They also have an AWS Marketplace, which is essentially a giant catalogue of big data tools all in one location.
An overview of HP HAVEn; source
HP’s big data offering is called HAVEn. It stands Hadoop/HDFS, Autonomy IDOL (which processes and indexes information), Vertica (for real-time analytics), Enterprise Security and nApps (apps, with an n on the front- we don’t get it either). Vertica, which was acquired by HP in 2011, which (when partnered with the HP ConvergedSystem 300) promises between 50 and 1,000 times the performance, 70% cost savings and takes only days to deploy. Partners of the HAVEn platform include Accenture Analytics, Deloitte Consulting LLP and Capgemini.
Of course, solutions from such established names come at a premium. Although many have pay-as-you-use, some of the technologies mentioned can cost $300,000 for hardware, software and services. On the other end of the financial spectrum, in the next edition of Understanding Big Data we’ll be looking at open source solutions- what technologies are available on the open source model, what opportunities they offer and how much you can actually get without a price tag attached.
(Featured image credit: Microsoft Azure)
Eileen has five years’ experience in journalism and editing for a range of online publications. She has a degree in English Literature from the University of Exeter, and is particularly interested in big data’s application in humanities. She is a native of Shropshire, United Kingdom.
Interested in more content like this? Sign up to our newsletter, and you wont miss a thing!