As happens when boundless potential meets hard reality, enterprises now face a long, painful slog through the trenches of disillusionment and disappointment as they pursue the business transformation promised by Machine Learning for the Enterprise. The machine learning hype cycle is in overdrive, inflating expectations for magically easy and automated solutions to complex business problems decades in the making.

The problem is this. Machine Learning learns from, of course, data, with algorithms continually improving themselves. But typical enterprise data is siloed and dirty, noisy and disparate – the natural byproduct of decades of idiosyncratic business process automation initiatives and M&A activity. So when those machines try to learn from stinky, messy data, how smart do you really think they’ll get?

Sending the right offer to the right customer at the right time – every time

Not very, as it turns out. Consider a scenario that could be faced by many a large, analytically-driven consumer business. Programmers are assigned to build a machine learning system designed to optimize outbound upsell offers to customers based on dozens of internal and external data types, from date of last purchase to long-range weather forecasts. The system is tested rigorously on a large sample of corporate data, and it sends the right offer to the right customer at the right time – every time. So far, so good. But in production mode it fails, frequently emailing repeat offers to the same (increasingly frustrated) customers.

The reason is this. Over the years, different divisions recorded different formats and versions of the same customer names in their databases. For a few hundred customers in a couple of divisions, cleaning and unifying these records is handled easily enough with traditional, manually-driven methods. But at the scale of multiple divisions and tens of millions of customers … this data preparation is far from easy. The continued growth and diversification of large enterprises – and by extension their many data sources  – have simply overwhelmed the capacity of traditional approaches to handle this ever-increasing load.

Machines do the heavy lifting

Quite simply, enterprises have run up a massive “technology debt” related to their “data mess,” and they will be unable to realize the transformative potential of machine learning until they pay it off. As one enterprise executive put it recently, “it took us 40 years to get into this data hole, and we’re not going to get out of it overnight.”

Fortunately, there is a way for enterprises to rapidly pay down vast amounts of that debt and the solution is – you guessed it – machine learning. Highly repetitive data preparation tasks that once fell on humans only, such as cleaning and unifying disparate records, can now be assumed by machines – at scale and with unprecedented speed and precision. With the machines taking on the heaviest lifting, the humans who are most familiar with the data can then do we they do best – validating and guiding what the machine does through feedback loops that generate ever-increasing levels of accuracy and efficiency. When applied to the problems of data quality and unification, this model of human-guided machine learning can reduce the pain of tasking skilled developers with low-level work and at the same time open up new opportunities for analytics that cross many data sources.

Human-guided Machine Learning for the Enterprise

The multinational business information company Thomson Reuters offers a terrific example of human-guided machine enterprise data preparation. Hundreds of acquisitions and the Thomson and Reuters merger in 2008 left the Global 2000 company challenged to connect and integrate the resulting siloed content assets at an entity level. To solve the problem, Thomson Reuters worked with Cambridge-based (and MIT CSAIL-birthed) Tamr, which uses a “human-guided machine learning” approach to unify hundreds of enterprise data sources quickly and accurately. By heavily automating the unification process (3 million+ entities) while bringing experts into the workflow for validation, the system sustained precision and recall rates over 95%, expedited integration by several months and reduced manual effort by more than 40%.

Another example of of applying machine learning to data before building machine learning on top of data is Toyota Motors Europe (TME). With hundreds of distributors and 30 siloed geographies in Europe, the company was challenged to connect its customer data for a comprehensive customer journey. TME unified dozens of customer repositories quickly and accurately. By heavily automating the unification process (for 10+ million customers) while bringing experts into the workflow for validation, Tamr used machine learning to build highly accurate base data sets with many signals across the customer journey. TME recently described how Tamr’s machine-human collaboration – specifically, “the inclusion of expert feedback in the model generation” – helped “ensure trust and accuracy” in its transcontinental enterprise data unification initiative. This rich, accurate data set can now be leveraged by TME data scientists across Europe, building predictive algorithms to improve customer segmentation, satisfaction and loyalty.

What is revolutionizing Machine Learning for the Enterprise?

In the short term, machine learning’s biggest impact on businesses like Thomson Reuters and Toyota Motors Europe may actually be to clean up data in preparation for the many predictive applications that large enterprises find attractive. It’s essential for large enterprises to build the next generation of AI and predictive applications on a solid foundation of clean, organized data – which is only possible if they use machine-driven, human-guided curation approaches.

Or, in other words, to find effective solutions leveraging Machine Learning for the Enterprise, apply the most advanced machine learning approach to your data so that you can use the most advanced machine learning on your data

Like this article? Subscribe to our weekly newsletter to never miss out!

Image: Enterprise, CC BY-NC 2.0, Mark Zwolanek

Previous post

AI – The Present in the Making

Next post

When Data Science Alone Won’t Cut it