With increased compliance laws in recent years governing how organizations must manage, retain and eventually delete customer data, it is vital that policies and processes are put in place so that obligations can be met and individual rights respected.
For instance, one of the common challenges is what must happen to data when a customer leaves, with the legacy old-school “delete everything after X years” approach no longer fit for purpose when data plays such an important role in contemporary business.
Today, organizations must create and implement data retention strategies that are more nuanced and focused—balancing the value that data retention can add to issues such as business insight, intelligence, and personalization with the requirement to delete when retention is no longer justified.
Table of Contents
How long is too long?
In the UK, the Information Commissioner’s Office (ICO) sets out data storage limitation guidelines that stipulate that organizations must not keep data for longer than they need. An example cited by the ICO illustrates banks’ issues in retaining customer data: “A bank holds personal data about its customers. This includes details of each customer’s address, date of birth, and mother’s maiden name. The bank uses this information as part of its security procedures. It is appropriate for the bank to retain this data for as long as the customer has an account with the bank. Even after the account has been closed, the bank may need to continue holding some of this information for legal or operational reasons for a further set time.”
Instead, banks must be able to justify how long they retain data and put policies that set out their retention periods in place. These must be regularly reviewed to be erased or anonymized as necessary. Personal data can only be retained for longer periods if it is for “public interest archiving, scientific or historical research, or statistical purposes.”
Less than a decade ago, the ‘right to erasure’ or ‘right to be forgotten’ was established as a human right by the European Court of Justice. This means people can ask organizations that hold data about them to remove it, and these individual rights are also set out in Article 17 of the GDPR, enforced in the UK by the ICO. It lists a range of criteria under which data must be deleted, including “the personal data are no longer necessary about the purposes for which they were collected or otherwise processed.”
Identification and removal of data to mitigate compliance risks
Alongside other grounds for deletion, including lack of consent or unlawful processing, this represents a broad range of scenarios where data must be identified and removed. For today’s data-hungry organizations, and the sheer volume of systems that most companies now have, it is becoming a mammoth task to comply with these rules and work out where that data resides, let alone fully delete it.
Failure to comply with these rules can result in heavy financial penalties. In May this year, for example, Google was fined EU10 million by authorities in Spain for serious GDPR breaches, having passed data to a third party without a legal basis, which subsequently obstructed the rights of individuals to have their data erased.
This has become increasingly challenging in recent years, with governance and privacy teams tasked with delivering processes and technologies to identify specific customer datasets fully.
As a result, building the technology infrastructure to map stored data across all systems to an “entity,” in this case, a customer, and correlate all data to that entity is becoming critical to meeting compliance obligations. However, there is no other way to prove beyond doubt that the customer data has been deleted from all systems unless it can be demonstrated that the “entity” data no longer exists across all systems. This can create compliance issues in many organizations, especially when their systems, tools, and processes are not designed to provide these increasingly crucial capabilities.
Automating data discovery
The answer lies in implementing a correlated “entity” based retention solution capable of working across multiple systems and datasets to address these challenges. By automating discovery and classification across disparate data silos, it becomes possible for organizations to implement effective retention and removal policies based on that “entity.”
The ideal approach is that organizations should be able to create a virtual customer identity, with the ability to select attributes that reflect what a customer looks like in their systems. For example, the organization could select columns from multiple databases to create a gold-image/master data set of customer data. This would be information such as name, email, address, phone number, account number, etc, and then the solution would correlate these data points across ALL systems.
Instead of just looking for specific patterns that inevitably return a narrow set of results, the most effective and innovative emerging systems take those additional data points and then apply machine learning to automate and find every instance, whether it fits within a pre-defined pattern or not. Using those reference points and effectively creating a virtual database of all the data points that relate to each customer gives organizations a much more granular view. Where legacy systems would have delivered perhaps 60% accuracy, they are now effective to 95% in correlating the relevant data sets for removal.
This also creates synergy between technology and policy, using the right tool sets to ensure that policy can fit both the business purpose and the regulatory need. At the same time, the enforcement of retention policies can be automated so customers can move the right data beyond use at the right time.
The result is a transformational capability that empowers data-centric organizations to join the dots across their datasets to meet audit and compliance requirements.
This enables organizations to reduce the risk surface of the data they store and process while also minimizing the volumes of data they are dealing with. In doing so, it becomes possible to optimize the cost and complexity of technology infrastructure to deliver a strategy that meets governance needs.
As data volumes and complexity increase, so do the risks for organizations that do not establish the processes or technology to effectively manage their entire lifecycle – including the stage at which it must be deleted. Without a greater emphasis on closing the loop, there are certain to be many more breaches of governance rules in the years ahead. Putting the right tools in place, however, puts organizations in a strong position to balance the value of the data they hold with the rights of individuals across the digital economy.