Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

The data lakehouse: just another crazy buzzword?

byHelena Schwenk
April 13, 2021
in Articles, Artificial Intelligence, Contributors
Home Resources Articles

Data professionals have long debated the merits of the data lake versus the data warehouse. But this debate has become increasingly intense in recent times with the prevalence of data and analytics workloads in the cloud, the growing frustration with the brittleness of Hadoop, and hype around a new architectural pattern – the “data lakehouse.”

The data lakehouse is a relatively new paradigm that refers to a hybrid data architecture that aims to mix the best of a data warehouse and data lake. If the term is new to you, you’re not alone.

The terms explained

To fully understand how these terms fit into the overall data landscape, it’s worth unpicking their similarities and differences. 

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

To begin with, all are used for the management of operational and transactional data, which support business intelligence (BI) and analytical workloads across both business departments and developer functions. Digging into their specific definitions also reveals the different goals they serve.  

Data warehouses, for example, are optimized for predefined and repeatable analytics queries where structured data can be scaled across an organization. Because they are often used for business performance and regulatory reporting, data warehouses are highly governed data environments and are suited towards high-performance, sometimes complex queries and high levels of concurrent access.

Data lakes collate unrefined structured and semi-structured data from multiple different sources and are subject to less rigorous data governance regimes. They often use cheaper and scalable storage where different processing styles and methods, including machine learning (ML) and batch-orientated workloads, are supported. However, data lakes are rarely optimized for the demands of production delivery – such as concurrency, latency, and workload management.

Despite some apparent differences, overlaps between the two architectural patterns do exist. For example, a data lake can use approaches that employ star schemas for batch-orientated queries, and a data warehouse could be leveraged to operationalize data science with ML models running against governed data. 

Cutting through the data lakehouse hype

Conceptually, a data lakehouse is designed to combine the core elements of data warehousing with the core concepts of a data lake, for example, by providing the lower costs of cloud storage for raw data with support for high-performance processing of ML, BI, analytics workloads, and data governance.

This might sound like a good idea, but the lakehouse is an emerging concept that is still misunderstood by many and subject to a lot of hype and speculation. 

Despite this, there are strong advocates on both sides of the data architecture divide. Those with a background in data warehousing position the lakehouse around relational technology concepts. Those on the data lake side have roots in ML and Spark processing, where support for Java, Python, and R workloads is a higher priority. Both, however, promote the use of the cloud for storage and analytical processing. 

It’s rarely an either/or decision

While the debate continues, the lakehouse is unlikely to remove the need for either the data lake or data warehouse, at least in the short term, not least for those organizations who have made significant investments in either or both. Likewise, as an emerging concept, it still has a lot of catching up to do in terms of the decades of innovation we have seen in areas such as in-database analytics, query and performance optimization, and columnar storage and compression.

There is also still a sound argument for the co-existence of data warehouses and data lakes where it provides a basis for businesses to scale and democratize data as well as rationalizing data ecosystems. A co-existence approach, in whatever combination, draws on the strengths of each architectural design to serve a wider number of use cases than any of these architectures can support independently.

Prioritize flexibility 

With the backdrop of an ever-changing and complex data landscape, data professionals need to ensure their existing environment that utilizes data warehouses and/or data lakes work together rather than against each other. For example, the data warehouse can provide well-defined and repeatable data analytics while the data lake supports more experimental or developer-led ML use cases utilizing a wider pool of data. Combining both gives organizations the ability to support different use cases and different audiences – such as business users and data scientists, and apply different data governance treatments, data curation, and data quality.

Exactly where and how a data lakehouse fits in this environment remains to be seen. The concept is still untested by the market at large, with the promise of the one-size-fits-all approach likely to be a step too far for those organizations who have invested significantly in data lakes and warehouses. It is, however, an important debate to have in such an innovative and fast-moving data infrastructure market that continues to evolve.  

Tags: Big Datadata lakeData WarehousesurveillanceUSA

Related Posts

Zoom announces AI Companion 3.0 at Zoomtopia

Zoom announces AI Companion 3.0 at Zoomtopia

September 19, 2025
Google Cloud adds Lovable and Windsurf as AI coding customers

Google Cloud adds Lovable and Windsurf as AI coding customers

September 19, 2025
Elon Musk’s xAI chatbot Grok exposed hundreds of thousands of private user conversations

Elon Musk’s xAI chatbot Grok exposed hundreds of thousands of private user conversations

September 19, 2025
DeepSeek releases R1 model trained for 4,000 on 512 H800 GPUs

DeepSeek releases R1 model trained for $294,000 on 512 H800 GPUs

September 19, 2025
Best ELD devices and fleet management tools 2025: Top picks for trucking companies

Best ELD devices and fleet management tools 2025: Top picks for trucking companies

September 18, 2025
Google’s Gemini AI achieves gold medal in prestigious ICPC coding competition, outperforming most human teams

Google’s Gemini AI achieves gold medal in prestigious ICPC coding competition, outperforming most human teams

September 18, 2025
Please login to join discussion

LATEST NEWS

Zoom announces AI Companion 3.0 at Zoomtopia

Google Cloud adds Lovable and Windsurf as AI coding customers

Radware tricks ChatGPT’s Deep Research into Gmail data leak

Elon Musk’s xAI chatbot Grok exposed hundreds of thousands of private user conversations

Roblox game Steal a Brainrot removes AI-generated character, sparking fan backlash and a debate over copyright

DeepSeek releases R1 model trained for $294,000 on 512 H800 GPUs

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.