Monday saw the unveiling of Tamr, a scalable platform for data curation, at the Databeat Conference in San Francisco. Founders Andy Palmer and Michael Stonebraker also announced they had accrued an impressive $16 million of first-round funding from Google Ventures and New Enterprise Associates.
Palmer and Stonebraker came up with idea for the software during their research at MIT, during which they discovered a wide-spread demand amongst tech companies for software that would enable them to prepare data for analysis faster. “We came to the conclusion that what was required was something that automated the integration of new sources and new attributes over time and would insulate the system from the changes in the fundamental sources so that you didn’t have to go … and re-engineer from the top down all the time all this ETL [extract, transform, and load work],” said Palmer.
Tamr sits on top of the company’s exisiting databases and provides a holistic view across all of the systems. Through its machine learning algorithm, Tamr roots out similar sets of data across different databases and delivers a report on this data to the company. Each report item has a ‘confidence rating’ (i.e, an indicator of how certain the programme is that the datasets are similar). Human input is then required to decide if the data is similar; if it is, Tamr maps the two fields or collumns together, making the systems more cohesive and streamlined.
Using this software to integrate and combine datasets could give companies a competitive advantage in data analysis over firms who are only exploring one data set in isolation, rather than gaining a wider perspective. Rich Miner of Google Ventures, one of Tamr’s primary investors, said of its value: “Businesses can’t keep up with the number and depth of data sources exploding within their companies. Tamr combines machine learning and corporate knowledge to unlock a unified view of companies’ most valued data repositories.”