jack-norris-mapr-255x300Cisco predicts that by 2020, we’ll have 50 billion connected devices. To put the in perspective, if current population projections are accurate, that’s between six seven connected devices for every person on the planet. Of course, this many data-generating devices will be a treasure-trove of opportunity- but also a complete nightmare from a data processing point of view. We discussed how the Internet of Things will change our lives with MapR’s CMO Jack Norris- from how he expects technology to handle this data explosion, to which IoT applications he considers the most promising.


Talk us through some of the opportunities of the Internet of Things.

Our lives will become increasingly connected- from our lives as consumers (with the rise of wearable tech),to our work lives (by using smart devices and equipment) and as inhabitants of smart cities and infrastructures.

The Internet of Things is an exploding source of information, and the business opportunity here is that organisations that are moving early and figuring out the best way to leverage this information and integrate it into their business are seeing some dramatic results. The analytics are built into the process- in manufacturing, for instance, we’re seeing information gathered from across the world about equipment failure in global industries. We can then see the status of the equipment right before failure and then try to figure out what other equipment is in similar state and schedule preventive maintenance or take corrective action prior to other failures.

We’re also seeing more applications in security and fraud area and data that helps provide more context so that you can fine-tune or tailor a product for individual customers. So that’s where the business opportunities are.

What about the challenges?

From the challenges standpoint, the amount of data, the speed of the data and the variety of information sources are really swamping existing infrastructure and tools. We’ve been talking about the volume, velocity and variety of data for quite some time, and I think The Internet of Things takes each of those dimensions and just accelerates them. When we talk about volume of data, if you think about 20 billion devices all providing a continuous stream of information , the volume of data just goes through the roof. One example is a single jet engine can generate a terabyte of information on one transatlantic flight. So we’re talking about a tremendous volume of data.

In terms of velocity, what’s required is the ability to process this data on a streaming basis and be able to adjust quickly. Inputting the data, doing some sort of transformation from the data warehouse and being able to see the data days later is not really effective in the Internet of Things context. Our CTOs are talking about the write-intensive nature of data sources, where you need a platform that can handle the stream and can quickly be used to kind of filter and identify interesting information and separate the signal from the noise too.

What are the architectural requirements and challenges the Internet of Things presents?

What’s required is a platform that can scale very quickly. We’re not just asking “can the system handle terabytes or petabytes?”, but “can it handle millions or billions small individual files?” A hundred million files is not that large from an Internet of Things perspective so, can it scale to a billion, can it scale to a trillion files? That is the area that MapR has provided a platform for from the very beginning.

The ability to combine deep predictive analytics with real time capabilities is an absolute requirement. So an integrated, in-Hadoop database is a key feature.

How do you handle different languages between different devices?

The term that has been used with respect to this has been “polyglot persistence”. Instead of trying to find one common data format, you need a platform which is able to process and deal with many different native formats. So it’s more focused on the platform having the flexibility instead of trying to eliminate the flexibility and use a common format.

That said, one of the more exciting data formats are self-describing data formats, like the JSON format. There’s no separate schema, there’s no separate metadata that needs to be set-up with the data- you can have very complex kind of nested file format and then the pressure’s on the platform to be able to deal with these formats.

Do you think Hadoop has sufficiently low latency to deal with Internet of Things applications?

If you look at Apache Hadoop, and commercialised Hadoop distributions, one of the big issues is the underlying data format, the data platform. HDFS, is a write-once file system, so it does not support streaming files, you cannot continuously write and analyse- which introduces a key batch constraint to Hadoop. Even with HBase running on top of these platforms, you have a lot of latency spikes and latency issues ultimately because of the underlying data platform on which HBase is attempting to write and store the data.

This is a problem area we at MapR recognised very early on, when we were first founding the company. What we’ve done at MapR is provide full support for open source and incorporating all of the associated packages with Hadoop and provided an enhanced data platform- one of the enhancements is a complete random read/write storage layer so that we can support streaming datasets. You can have data continuously write to the MapR platform and do analytics directly on that. We also have integrated an in-Hadoop database that supports HBase applications and does so with complete consistent low latency. Real-time application means operational analytics that adjust to business as it happens. It’s fully supported and used today by MapR customers.

Are there any Internet of Things projects or use cases you find particularly interesting?

There’s such wide range. Some of some of agricultural projects we see, using deep predictive analytics that are looking at the weather combined with individual farm equipment measuring soil condition and type are interesting. They allow you to see exactly what’s going on in that little micro-climate within a particular field. We’re seeing better applications of fertiliser because of this- this is a great example of the Internet of Things influencing operations on a micro-level. We’ve also seen some examples in the smart cities, we’ve seen some examples of the manufacturing and supply chain- but I think we’re just at the tip of the iceberg.


(Image credit: Artur Staszewski)

Previous post

22 October, 2014- GraphConnect 2014, San Francisco

Next post

MapR Performance Benchmark Exceeds 100 Million Data Points Per Second Ingest