Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Six reasons to think twice about your data lake strategy

by Vikram Bhalchandra
July 23, 2018
in Big Data, Cybersecurity, Data Science
Home Topics Data Science Big Data
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

Since data has been called the “oil” of the new economy, it’s easy to assume that more is better. You can never have too much oil, so the same goes for data too, right?

Hence there has been a lot of hype about data lakes over the past few years. According to TechTarget, a data lake is “a storage repository that holds a vast amount of raw data in its native format until it is needed.” The hype is understandable since data lakes are generally cheaper than enterprise data warehouses. On an abstract level, the idea of stockpiling data first and finding a use for it later also sounds like common sense.

 If you’ve lately been sold on the need for a data lake, here are six things to consider before jumping in:

 

The amount of data is exponentially increasing


Join the Partisia Blockchain Hackathon, design the future, gain new skills, and win!


The digital universe doubles in size every two years and the amount of data we create and copy annually is set to hit 44 zettabytes by 2020. That is 10 times more than what the number was in  2014. It stands to reason that creating larger repositories for all of your structured and unstructured data is bound to run up against cost limitations. If not, the sheer heft of increasing data loads will present a larger challenge for organizations that haven’t yet decided how they will make sense of the data they already have.

 

Your chance of holding on to “bad” data rises

With the GDPR, companies will be charged a fine of up to four  percent of annual revenues for holding on to data that was procured without the consumer’s consent. For companies that have already created a data lake, ensuring GDPR compliance can be a major headache. GDPR illustrates the dangers of taking this approach if similar legislation pops up elsewhere. Given the serious concerns raised globally by the Facebook data scandal, it’s only a matter of time before the power to control data moves from the enterprise to consumers globally. GDPR is likely the first of many such future compliance laws. With this scenario, data lakes without a clear strategy for the data can become a millstone around the neck. 

 

Security is often an afterthought

Data in a data lake lacks standard security protection with a relational database management system or an enterprise database. In their rush to be “agile,” some companies will even give trusted business managers Internet-based access to data lakes. In practice, this means that the data is unencrypted and lacks access control. Multiple examples of inappropriate data access are now in the public domain and have caused significant damage to the reputation and bottom line of leading companies.

 

 Lack of quality control can turn your data lake into a swamp

The idea behind data lakes is that if you gather and store enough data that you will be able to glean business-relevant insights. This scenario ignores the old computing maxim of “garbage in, garbage out” though. If there are no guidelines about the cleanliness of the data, then your so-called insights will be flawed. This has been a traditional data problem that gets magnified multifold in the big data scenario. Data lakes come with the added complexity of unstructured data thereby creating a serious issue of unusable data.

 

It takes a high level of expertise to make sense of the data

A lack of semantic consistency and governed metadata means that only specially trained experts will be able to reconcile the data. The average company may have a hard time finding people skilled in data-flow technologies like Spark and Flume. Beyond the technology expertise, data science expertise with experience across specific industries becomes critical for creating data models and algorithms that will provide actionable insights.

 

The technology landscape is very confusing

Just a simple Google search on data lake products will throw up over a million hits. From leading tech giants like IBM, Microsoft, Google and Amazon to small startups – everyone has a significant “data lake” offering. Beyond this, there is the technology stack to consider. Do you look at Hadoop and the multiple versions of it, or custom stacks from the big tech giants? Identifying the infrastructure you need for your data lake – cloud or in house – adds another dimension to this journey.

Managing and running a data lake on an ongoing basis is also another decision point in this journey. An effective data lake technology strategy and identifying the right set of partners and experts thus becomes critical before moving ahead on this path. 

Though there are some valid reasons for skepticism about data lakes, the technology itself is neutral. The fact is that data lakes can be a great resource for some companies. But everyone should be careful of the marketing pitch of any technology, and data lakes are no exception. The best advice is: take a very close look before you jump in.

Related Posts

Explained: Is ChatGPT plagiarism free?

Explained: Is ChatGPT plagiarism free?

March 28, 2023
What is an IoT ecosystem: Examples and diagram

How can data science optimize performance in IoT ecosystems?

March 28, 2023
Consensus AI makes accessing scientific information easier than ever

Consensus AI makes accessing scientific information easier than ever

March 27, 2023
robotic process automation vs machine learning

A comprehensive comparison of RPA and ML

March 27, 2023
ChatGPT now supports plugins and can access live web data

ChatGPT now supports plugins and can access live web data

March 24, 2023
business intelligence career path explained

From zero to BI hero: Launching your business intelligence career

March 24, 2023

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

LATEST ARTICLES

Explained: Is ChatGPT plagiarism free?

How can data science optimize performance in IoT ecosystems?

Consensus AI makes accessing scientific information easier than ever

A comprehensive comparison of RPA and ML

ChatGPT now supports plugins and can access live web data

From zero to BI hero: Launching your business intelligence career

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy
  • Partnership
  • Writers wanted

Follow Us

  • News
  • AI
  • Big Data
  • Machine Learning
  • Trends
    • Blockchain
    • Cybersecurity
    • FinTech
    • Gaming
    • Internet of Things
    • Startups
    • Whitepapers
  • Industry
    • Energy & Environment
    • Finance
    • Healthcare
    • Industrial Goods & Services
    • Marketing & Sales
    • Retail & Consumer
    • Technology & IT
    • Transportation & Logistics
  • Events
  • About
    • About Us
    • Contact
    • Imprint
    • Legal & Privacy
    • Newsletter
    • Partner With Us
    • Writers wanted
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.