Machine unlearning techniques are emerging as a crucial method to cleanse generative AI models of unwanted elements, such as sensitive personal data or protected content, which they may inadvertently absorb during their training phase. However, these methods come with significant drawbacks. A recent collaborative study involving experts from the University of Washington, Princeton, the University of Chicago, USC, and Google highlights a troubling trade-off: while striving to purge irrelevant data, these techniques can severely impair the AI’s basic cognitive functions.
The findings reveal that the prevailing unlearning methods could render advanced models like OpenAI’s GPT-4 or Meta’s Llama 3.1 405B significantly less adept at handling even elementary queries, often to the extent of rendering them practically ineffective.
What is machine unlearning?
Machine unlearning is a relatively new concept in the field of artificial intelligence, particularly concerning large language models (LLMs). In simple terms, machine unlearning is the process of making a machine learning model forget specific data it has previously learned. This becomes crucial when the data includes sensitive private information or copyrighted material that should not have been included in the training set initially.
For those who are non-tech-savvy, machine learning, a cornerstone of artificial intelligence, trains computers to interpret data and make decisions. It’s divided primarily into three types: supervised, unsupervised, and reinforcement learning.
Supervised learning uses labeled data—examples with known outcomes—to train models predictively. This method is akin to learning with an answer key and is ideal for:
- Classification tasks, such as identifying whether an email is spam.
- Regression tasks, like forecasting real estate prices.
Unsupervised learning operates without labeled data, allowing the model to identify patterns and structures on its own. It’s similar to self-study without explicit guidance, useful for:
- Clustering, where the model groups similar data points together, such as customer segmentation.
- Association, which finds commonalities in data, like market basket analysis where customers who buy one item also buy another.
Reinforcement learning involves learning through trial and error, using rewards or penalties to shape the behavior of an agent within a decision-making process. It mimics the way a trainer might use treats to teach a dog new tricks, applicable in:
- Gaming and simulations, where agents learn strategies to win.
- Robotic movements, for tasks requiring a sequence of precise actions.
Each learning type leverages unique approaches to digest and process information, chosen based on the specific requirements and data availability of the task.
The challenge of unlearning
Language models are trained using massive pools of text data gathered from various sources. This data could inadvertently include private details or copyrighted content. If a data owner (the individual or entity that owns the rights to a dataset) identifies their data within a model and wishes for its removal—perhaps due to privacy concerns or copyright infringement—the ideal solution would be to simply remove this data from the model.
However, completely removing specific data from a language model, which has already learned from billions of other data points, is not straightforward. The process, often referred to as “retraining,” involves adjusting the model as if the specific data was never part of the learning process in the first place. This is typically “intractable” or impractical with modern, large-scale models due to their complexity and the vast amount of data they handle.
Top AI and machine learning trends to follow in 2024
Approximate machine unlearning algorithms
Due to the challenges of exact unlearning, researchers have developed several “approximate unlearning algorithms.” These are methods designed to remove the influence of unwanted data from a model without needing to rebuild the model from scratch. However, evaluating the effectiveness of these algorithms can be tricky. Historically, evaluations have been limited, not fully capturing whether these algorithms successfully meet the needs of both the data owners (who want their data forgotten) and the model deployers (who want their models to remain effective).
Introducing MUSE
To address these evaluation challenges, the study proposes MUSE, a comprehensive benchmark for evaluating machine unlearning. MUSE tests unlearning algorithms against six criteria, which are considered desirable properties for a model that has undergone unlearning:
- No verbatim memorization: The model should not remember exact phrases or sentences.
- No knowledge memorization: It should not retain detailed knowledge derived from the specific data.
- No privacy leakage: It should not leak any private information.
- Utility preservation: The model should still perform well on other data not targeted for removal.
- Scalability: It should handle large and multiple requests for data removal efficiently.
- Sustainability: It should manage successive unlearning requests without deteriorating in performance.
How to make a model unlearn?
Generative AI models operate devoid of what we might consider genuine intelligence. Rather, these systems function on statistical analysis, predicting patterns across a vast spectrum of data—from textual content and images to speech and videos—by processing a multitude of examples such as movies, voice recordings, and essays. For instance, when presented with the phrase “Looking forward…”, a model trained on auto-completing emails might predictively finish it with “… to hearing back,” based purely on the repetition it has observed in data, without any semblance of human anticipation.
Primarily, these models, including the advanced GPT-4o, derive their training from publicly accessible websites and datasets, under the banner of ‘fair use.’ This practice, defended by developers, involves scraping this data without the consent, remuneration, or acknowledgment of the original data owners, leading to legal challenges from various copyright holders seeking reform.
Amidst this backdrop, the concept of machine unlearning has ascended to prominence. Recently, Google, alongside academic partners, initiated a contest aimed at encouraging the development of new methods for unlearning, which would facilitate the erasure of sensitive content—like medical records or compromising images—from AI models upon requests or legal demands. Historically, due to their training methodologies, these models often inadvertently capture private information ranging from phone numbers to more sensitive data. While some companies have introduced mechanisms allowing for the exclusion of data from future training, these don’t extend to models already in use, positioning unlearning as a more comprehensive solution for data removal.
However, machine unlearning is not as straightforward as simply deleting a folder. Today’s unlearning techniques employ sophisticated algorithms designed to redirect the models away from the unwanted data. This involves subtly adjusting the model’s predictive mechanics to ensure it either never, or very seldom, regurgitates the specified data.
The study applied these criteria to evaluate popular unlearning algorithms on language models trained with 7 billion parameters, using datasets like Harry Potter books and news articles. The results showed that while most algorithms could prevent the model from verbatim and knowledge memorization to some extent, only one algorithm managed to do so without causing significant privacy leaks. Moreover, these algorithms generally fell short in maintaining the overall utility of the model, especially when handling large-scale or multiple unlearning requests.
The findings highlight a critical gap in the practical application of unlearning algorithms: they often fail to meet the necessary standards for effective and safe data removal. This has significant implications for both privacy advocates and AI developers.
In summary, while machine unlearning is a promising field that addresses important ethical concerns in AI development, there is still much work to be done to make these techniques practical and reliable. The MUSE benchmark aims to aid this development by providing a robust framework for evaluating and improving unlearning algorithms.
Image credits: Kerem Gülen/Midjourney