Uncontrolled data collection and storage have economic, ecologic, and social drawbacks. And the solution to those challenges is data sustainability!

Be careful what you wish for

Our digital world is built on data. The adage of recent years was that the more amount of data you collect, the better. After all, “Big Data” promises a glittering treasure trove that can improve profits and innovation. However, it is not enough to gather data; also you must analyze it, understand it and create insights from it.

Data volumes are exploding, making it increasingly difficult for businesses and institutions to handle or analyze the data they have. According to a research by splunk, 57 percent of business and IT professionals say that the amount of data is increasing at such a breakneck speed that their organization can no longer keep up. It’s time to rethink, and the solution is to think long-term. What approach should we take?

The answer is long-term sustainability.

You might be thinking to yourself, “Sustainability? Isn’t that something about environmental protection?” Yes, you are correct, but we’re talking about economically oriented sustainability here. It is not about generating profits and investing in environmental and social projects; it’s about generating profits in an ecologically and socially acceptable way.

So, when it comes to data sustainability, there are a few things to consider, which we shall explore in greater detail below:

  • the economic aspect
  • the ecological aspect,
  • and the social aspect

The uncontrolled collection and storage of data cause high costs and data protection troubles


It is an expensive hobby to collect all the data you can

Data collection and storage have a high cost, many of which we are unaware of. Before we get into the specifics of costs, here are some quick definitions to help you. Data can be categorized into the following three types: business-critical data (clean data), ROT data, and dark data:

  • Business-critical data is information that is critical to a company’s financial stability and development. It has been used, hence the name “clean data.” These represent 14% of the whole.
  • ROT stands for “redundant, obsolete, and trivial data.” 32% of the data is ROT data, which refers to information that exists multiple times, is not needed for other reasons, and therefore is worthless.
  • Dark data is unclassified, or data that has been unused for a long time and whose content and usefulness are unknown. This group’s share is rapidly increasing, with 54% as recently as last year.

Data storage and management are both costly and time-consuming, as shown by the following example: A firm with only 250 terabytes of data must spend around $1.25 million each year on data storage. This entails the following:

  • $175,000 to store clean data (14%)
  • $400,000 to store ROT data (32%)
  • $675,000 to store dark data (54%)

As a reminder, only clean data is important for the company’s survival; the rest is – excuse the pun – a waste of money. Cost is not the only issue here anyway. The assumed 250 terabytes represent roughly 580 million files. According to the stats, more than half of these files, or just over 300 million files, are entirely unknown to the enterprise. Many of these files contain sensitive personal information which is also relevant to the GDPR… 

Data Sustainability

The social aspect of data

According to the splunk poll cited above, 80% of CEOs consider data a key success indicator. This is also seen in the cost of data. However, the only ones who do not participate in this trade talk are us – the data producers. We can use services and platforms supposedly for free, but the price we have to pay for them is our data in a nontransparent setting. We, the users have lost control of our data, and we are frequently unaware of the ramifications since the consequences are typically indirect. As a result, our privacy gets violated once in a while by some company we handed our data for using a service or a product.

An expensive hobby fueled by CO2

The amount of data collected and stored is increasing at an accelerated pace. Every year, the amount of data worldwide grows by about 27%, according to IWD. To put it another way: One zettabyte is the equivalent of a billion terabytes. A 90-minute film in standard quality requires 500 megabytes of storage. A zettabyte is equal to 2 trillion films…

The IWD predicts that worldwide data volume will reach 175 zettabytes in 2025. This enormous amount of data presents us with a further problem: Energy requirements for data centers mean producing vast amounts of CO2.


The uncontrolled data collection and storage is an ecological mistake



From 2010 to 2021, energy consumption for German data centers has increased by 15% to 12 billion kWh/a or approximately 2% of Germany’s total electricity use. The growth rate is increasing: By 2025, the electricity consumption will be around 16.4 billion kWh/a. As a reminder, dark data accounts for over 50% of the data in this statistic.

And according to projections, the global energy demand for storing dark data in 2020 resulted in the emission of 5.8 million tonnes of carbon dioxide. This compares to the amount of CO2 discharged by a car as it travels around the globe 575,000 times.

Data sustainability as an image factor

Why collect data that you do not need or can’t use? Why should the GDPR be considered a cost center rather than a competitive advantage with which you may credibly improve your image? This is an enormous potential for European businesses because the valuable resource isn’t data but insights that may be drawn from it.

Instead of collecting all the data twice and three times on central servers and analyzing it with algorithms, sending the algorithms to where the data is stored, would be much cheaper and more effective. I can almost hear you asking how. The answer is deceptively simple: We may explore data right there on the spot with the supercomputers in our pockets—our smartphones. Thanks to a decentralized architecture, we can access information directly on any end device, cutting down on data movement and allowing greater insight retrieval.


The uncontrolled collection and storage of data have many disadvantages. Data is essential for our society and therefore indispensable. What is the alternative, then?



That is where we, polypoly, come in with polyPod. This is a GDPR as technology – built by us, a data cooperative owned by European citizens – created to give people more control over their personal information. The polyPod makes sense because only the precise information required is obtained rather than data. The savings potential is significant!

The knowledge generated by polyPod can be used to improve productivity, safety, and security. polyPod is environmentally beneficial because it reduces CO2 emissions. However, the CO2 savings potential is not overwhelming but still significant. polyPod is also social since it generates a digital income for citizens. When the algorithms arrive at the end devices, users can decide who gets access to their machine’s computing power and insights. Thus, we create data sustainability, while making data visible, and meaningful, and keeping users happy!

Previous post

The good, bad, and ugly sides of the cloud computing

Next post

Easy guide: Cloud computing basics for beginners