Dark data results from data collection methods beyond what can be analyzed. Humans generate an increasing amount of data. They are not erased so that nothing is missed. But many businesses don’t even know what data is kept on their servers. And here lies a problem…

More data than ever before is produced, gathered, and analyzed nowadays. Whether it’s to fuel machine learning and artificial intelligence applications or because companies want to show their users more personalized advertising. Companies gather a lot, and they can do so because users or sensors frequently provide data to them voluntarily. However, the data gatherers don’t know if they will ever need that data. Whether it’s the user’s hair color or the exact moment they turn on the bathroom light in the morning, they constantly capture it out of fear of missing something. If not now, then perhaps tomorrow, there is still a chance that this data can be used. But for now, it causes issues beyond imagination and even contributes to global warming.

What is dark data?

Before going deeper into the subject, let’s briefly remember what dark data is: Dark data is information that is gathered through various computer network operations but is not in any way used to gain knowledge or make decisions. An organization’s capacity for data collection may be greater than its capacity for data analysis, which results in data becoming dark. We will explore how data turns dark later on. But it’s important to note that companies don’t even know where their dark data is stored in most cases.

Hard truths about dark data 

Cisco discovered that a city of a million people generates 200 petabytes (200 million gigabytes) of data daily thanks to connected smart homes, vehicles, and other sensors. The shocking surprise is that companies use only 0.1% of this data. This indicates that massive volumes of data are produced during big data collecting binges, most of which initially remain unused and, in the worst situation, turn into “black data.”

Almost everyone my age has an unused floppy disk with data lying around the house. But modern computers no longer have the drives to read that. Even old email accounts are frequently hard to access, either because the service provider has ceased operations or because you have forgotten your password and the account has been restricted or deleted due to years of inactivity. However, the data you left behind doesn’t vanish.

Many resources, such as Veritas and Splunk, discovered that around 52% of enterprise data is obscure. This means businesses are completely unaware of their existence, let alone where to look for, handle, process, and use them. IDC states that half of the 175 zettabytes of data available globally is dark data.


On average, 52% of a company’s data is “dark,” meaning no value has been assigned to it


The expense of storing an enormous amount of data is another consideration; in addition to the cost of the storage media, there are also electrical expenses. Additionally, the price rises further if additional copies are made purely out of caution. And they generate enormous volumes of CO2. According to research by Veritas, dark data results in more than 6.4 million tons of CO2 emissions annually. This is because huge amounts of uselessly archived junk data are dumped on data centers.

Dark data might potentially lead to legal complications too. Personal data must be immediately erased if it is “no longer essential for the purposes for which it was acquired or otherwise processed,” according to the General Data Protection Regulation (GDPR). But who is in charge of so-called orphaned data, which no one can access anymore? Who is responsible for it, who owns it, and who would even dare to erase it?

90% of the energy used by data centers is reportedly wasted, according to the New York Times. The cost of electricity may be reduced if data weren’t kept. In addition, there are expenses related to missed opportunities and inadequate information use. In the storage environments of EMEA enterprises, black data makes up 54%, trivial, redundant, and obsolete data 32%, and business-critical data 14%. This resulted in $891 billion in storage and management expenses in 2020 that could have been avoided.

Dark data has an interesting nature. If not processed timely, useful data may turn dark after becoming irrelevant. For instance, if a company is aware of a customer’s geolocation, it may be able to provide offers based on that location. Still, if this data is not processed at once, it can become obsolete. According to IBM, around 60% of data loses its value instantly, which makes the subject even more interesting.

Risks of dark data

Persistent storage of dark data might put an organization at risk, especially if the material is sensitive. If there is a breach, there may be severe consequences. These can drastically harm an organization’s reputation and can be both financial and legal. A breach of a customer’s private records could lead to the theft of sensitive data, resulting in identity theft. Another illustration may be the compromise of private, sensitive firm data, such as that about research and development. These risks can be decreased by assessing and reviewing whether this data serves the organization, employing strong encryption, and erasing it irrecoverably if it needs to be deleted.

How to use dark data?

One of the ways to utilize dark data seems to be using metadata, which can be described as data about data. Data’s relevance and use can be unlocked using metadata to identify, link, curate, and cross-reference data. Metadata has specific features that can intelligently classify data by project, customer, workflow, status, and other criteria and relate to important organizational components. By measuring a data asset’s value in terms of its quantity of metadata, we can determine how valuable it is.

Metadata acts as a set of omnidirectional headlights, illuminating associations and links between objects and humans as they travel through Dark Data with great specificity and relevance. In addition to guaranteeing consistency in the use, storage, and sharing of information, metadata has the inherent ability to shed light on these linkages across one or more repositories or associated applications, such as an ERP or CRM system. To ensure, for example, that workflows and business processes are effectively followed and administered, it gives clear, concise visibility into the data origins and history.

Additionally, metadata may include details about a document’s creation, revision, archival, retention, and destruction, as well as the people, systems, and programs involved. These specifics may include the precise timestamp of changes and actions, such as reviews and approvals, and the required access permissions.

What does the future of dark data look like?

It is assumed that the value of dark data will increase when more sophisticated computing tools for data analysis are developed. The new industrial revolution will be built on data and analytics. This also applies to dark data because there aren’t enough resources to process it. Future usage of the gathered data could increase productivity and give businesses the capacity to satisfy consumer demand even better. Technological advances are making it possible to use this dark data affordably.

Additionally, many organizations are still unaware of the significance of dark data. Take healthcare and education organizations, for instance, which deal with vast volumes of data that have the potential to greatly improve the way they serve their target populations of consumers and patients. Undoubtedly, the ability to control and derive value from dark data will determine how it evolves in the future. We’ll discover this through experience. 

Previous post

The tech you use may cause higher premiums in cyber insurance

Next post

The ultimate combination of success: AI & IT