In the first part of our interview with Sean Owen, Cloudera‘s Director of Data Science, we discussed the relationship between machine learning and Hadoop, the future of Apache Mahout and why machine learning has become such hot property. In this part of our discussion, we delved into the future of deep learning and neural networks, and how Owen foresees the relationship between machine learning and enterprise evolving.
What do you think are some of the main trends in machine learning right now?
To be honest, I think machine learning is still an advanced topic for enterprises. The infrastrcutres of most enterprises are built around reporting and retroactive analytics, and predictive analytics is still considered difficult and expensive. There is some truth to this, but at least we’re finding tools and techniques breaking into the mainstream as open-source tools. So it’s now at least plausible that your average enterprise could access and deploy machine learning. So I view last year and this year as a real time of awakening to the possibilities. Most of our customers are just getting started, but they are successfully deploying what 5 years ago would have looked like very sophisticated and expensive predictive analytics. Most customers are looking at things like recommender use cases, with anomaly & fraud detection a close second.
We recently spoke to Twitter’s Machine Learning Engineer, Jake Mannix. He believes machine learning is going the same way as big data did a couple of years ago- people are starting to get excited about it from outside of the industry, but they still don’t really understand what it is, or how to use it. Do you agree?
Yeah, that’s a good analogy from Jake. I think people tend to project a lot onto the ideas of big data and machine learning. The promise seems magical; I cue up some data and I wave a magic wand, then I get some learning out. In practice, it’s not that simple. Certainly 5-years ago, it wasn’t simple at all; you had a team of specialists, you didn’t do it at scale. Those things are changing, scale is no longer a problem, we have a lot more automation, we have a lot more knowledge within the engineering community as well. So before, you had statisticians that did not know the first thing about building these analytics systems, and you had engineers who did not know the first thing about statistics. Now the crossover between the two- the field of data science- is getting much larger. So certainly a lot of hype, and a lot of hope projected onto machine learning. I think we are settling in to the reality of what machines can and can’t do at the moment. And that’s good, that means we’re starting to find people actually connecting hype to reality, and building real systems – that’s great. So I think that the reality is quickly catching up with the dream.
Moving forward, are there any current advancements within the field of machine learning that you’re particularly excited about?
That’s an interesting question. I think that most people in research, they’ll talk about deep learning and neural nets, which isn’t even a new idea but it has gotten a whole lot of traction again because we’re figuring out how to make them work at large scale.
My interest is more in scaling up and making updates to conventional predictive models. My interest is very much more practical and operational. I’m interested in parallel algorithms- it’s an area of machine learning research that gets alot less attention, but it’s are probably more important to your average enterprise. We need these algorithms to work at scale, in real time, rather than just focus on seeking out another percent or two of accuracy. So I’d point to research in online or streaming algorithms, as most interesting to me.
Deep learning and large neural networks have been getting a lot of press attention recently as well. Yet most of the publically-unveiled deep learning systems by massive companies like Google or Microsoft have focused around image recognition almost exclusively. What do you see in the future of deep learning?
Great question. I’m glad you said that because I too haven’t really seen deep learning make a big dent in any application, except for image recognition classification, and maybe related fields. In theory neural networks should be able to improve a lot of classification problems – in practice, it really hasn’t. It’s good at classification over a large number of continuous values, and that’s exactly image recognition. I suppose it probably has promise for audio recognition, obviously. I honestly have the same question that you do, I’m not sure it’s great for anything except image recognition and related fields. But improving image recognition is probably valuable to the large tech giants; for the other 95% of enterprises, I’m not so sure.
(Image credit: Cloudera)