The call to democratize data analytics today is unmistakable. The desire and need to perform analyses of company data, within all departments, has lead to this broad trend up and down the corporate landscape. By definition, data democratization is the principle that each one of us has access (and rights) to data – without roadblocks or tolls. The theory that data analysis can be done using common skills, such as spreadsheet creation and examination, will accelerate decision-making and enable organizations to discover new opportunities. However, achieving this goal unfortunately does have a few complex roadblocks.
Busting out the big guns
The ability to collect, organize and analyze company data has become increasingly integrated and accessible – that is, if you have the time and money to do it. Companies such as Informatica and Talend have been helping big organizations tame data exhaust for well over ten years, and newcomers like Tamr are using machine learning to streamline the process. What all of these vendors have in common is a big price tag and forklift infrastructure to install. Each of these services are geared for and sold as enterprise solutions, and are competing with the even bigger guns of IBM, SAP and Oracle.
However, democratization requires freedom. The above solutions may be good, but they have tremendous vendor lock-in. Today, in a world that demands companies’ participation in the information age, these “big gun” offerings cannot be the answer.
Democratization means freedom
As mentioned, in order to achieve true democratization of data analytics, there should not be any roadblocks or tolls. Frictionless data acquisition – the ability to store data without significant penalty for complexity or cost – is key to this premise. Once data is stored, the next core aspect of democratization is the ability to immediately make sense and disseminate access to said information. If access and insights cannot be derived promptly, this lack of speed should also be considered a roadblock to democratization.
Therefore, the democratization thesis that information should be easily and quickly stored, as well as easily and quickly retrieved (to derive and drive business value) should be seen as founding principles. The concept of “easy” and “quick” is important to this premise. The big, legacy data analysis players have achieved this relatively well through an army of skilled workers and a large price tag. However, since the high cost of these services is a major toll to democratization, do-it-yourself solutions have become the alternative.
Let’s look at an example. Instead of purchasing a complete solution from start to finish, companies design, build and implement each vertical aspect of the flow of information. Typically, this starts with centralized storage. From there, information goes through an Extract, Transform, Load (ETL) process into a database that structures data to be either queried or extracted again into some Business Intelligence visualization tool. Once data reaches Business Intelligence tooling, data analysis truly begins. Although these step-by-step solutions come at a lower cost than many large legacy systems, the DIY journey towards effective data analysis is still not in line with the core principles of democratization since it requires significant skill to obtain the analysis.
What has been deduced over the years is that storage, which is simple to use, elastic and cost efficient, can be a foundational choice on the road of democratization. Cloud storage is a particularly good place to make such an architectural bet since it allows for easy and quick data storage. However, the ability do perform analytical retrieval without complex and costly ETL has, so far, eluded the market. The popularity of Hadoop (the separation of storage and processing), along with its public failures, has caused many to believe the idea of Data Lake democratization is nothing more than complex and costly research experiments gone wrong, leading to the Data Swamp phenomenon. But is the Data Lake philosophy the problem or is it the tooling surrounding them?
Freedom is empowered by intelligence
A Data Lake philosophy built on object storage, such as Amazon S3, seems to uphold the core democratization principles. The missing piece is the ability to easily transform raw data into information that businesses can use directly. Traditionally, this analysis is performed by highly skilled data engineers and/or data scientists. And yet, there is a trend in intelligent automation, or using Artificial Intelligence (AI) to transform all aspects of our economy. The idea that intelligent systems can automate manual procedures easily, quickly and cost-efficiently is now commonplace. Smart applications, services and devices have allowed each one of us to become more productive. It is this trend that will most certainly lead to smart object storage and the next phase in data analytics democratization.
Today’s data landscape enables all organizations and people to participate in the new information age. However, many organizations are still wasting money using legacy, high-cost business intelligence solutions to derive value from their data. Alternatively, DIY solutions can create roadblocks including increased complexity and limited capabilities. Now, using intelligent tools built on object storage, freedom from the cost and complexity of traditional services is achievable. As a simple, elastic and cost-effective solution for data analysis, object storage will serve as the foundation for future data lake architectures, enabling easy and quick data analysis to derive Business Intelligence.
Like this article? Subscribe to our weekly newsletter to never miss out!