Paxata was launched in October of 2013, after 18 months of preparation, to roll up the world of data cleaning. Ventana, a research firm, has found that around 40 – 60 % of a data scientist’s time is spent cleaning the data in preparation for later analysis. In order to increase efficiency in this respect Paxata has now launched its tool targeting both the nature and quality of the data. Paxata is able to provide a very early assessment of whether the data collected is in fact the data required for the desired analyses, thus shortening the lead times within the cycle of collecting data, checking it, and subsequently adding new data sets.

In addition, Paxata claims to be able to detect similarities or associations between data to automatically resolve syntactic and semantic quality issues. All of their algorithms are based on machine learning, so as the database grows it also becomes more powerful.

Previous post

HIV outbreaks detected through Twitter

Next post

DB Research big data report released