Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

MIT and ETH Zurich unveil SDFT to stop AI from forgetting old skills

The technique enables continual learning, meaning a single model can stack diverse skills like medicine, law, and coding without losing previous knowledge.

byKerem Gülen
February 18, 2026
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

Researchers at MIT, the Improbable AI Lab, and ETH Zurich have developed a new technique called self-distillation fine-tuning (SDFT) for large language models (LLMs).

SDFT enables LLMs to acquire new skills and knowledge without losing prior capabilities, addressing a challenge known as catastrophic forgetting. This method allows models to learn from demonstrations and their own experiments by leveraging in-context learning. Experiments show SDFT outperforms traditional supervised fine-tuning (SFT) and mitigates limitations found in reinforcement learning algorithms.

For enterprise applications, SDFT permits a single model to accumulate multiple skills while maintaining performance on earlier tasks. This allows AI agents to adapt to changing business environments, acquire proprietary knowledge, and gain new skills without extensive retraining or degradation of general reasoning abilities.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Current LLMs are static after deployment and do not update parameters to learn new skills or knowledge. Continual learning, which facilitates knowledge accumulation similar to human learning, is necessary for adaptive AI.

On-policy learning, where a model learns from self-generated data to correct errors, is an effective learning method. This contrasts with mimicking static datasets. Without on-policy learning, models can experience catastrophic forgetting, losing previous knowledge when learning new tasks.

Reinforcement learning (RL), a form of on-policy learning, requires an explicit reward function for scoring outputs. While effective for clear outcomes like math, defining such a function for many real-world enterprise scenarios, such as writing a legal brief, is challenging or impossible. RL methods also struggle to teach entirely new information, such as specific company protocols, because the model lacks initial knowledge to generate positive signals for learning.

The standard alternative, supervised fine-tuning (SFT), trains models on fixed datasets of expert demonstrations. SFT provides clear ground truth but is “off-policy.” Models mimic data rather than learning from attempts, often failing to generalize to out-of-distribution examples and experiencing catastrophic forgetting.

SDFT combines elements of SFT and RL, enabling on-policy learning using prerecorded demonstrations without needing a reward function. It uses distillation, where a student model mimics a teacher. The researchers utilized the model’s own in-context learning (ICL) capabilities to create a feedback loop within a single model. ICL allows LLMs to solve new problems using provided examples without parameter updates.

During training, the model functions in two roles:

  • The teacher: A frozen version of the model receives the query and expert demonstrations. It uses ICL to deduce the correct answer and reasoning.
  • The student: This version receives only the query, simulating real-world deployment.

The teacher provides feedback when the student generates an answer. The student then updates its parameters to align with the teacher’s distribution. This creates an on-policy learning loop. Supervision comes from the model’s interactions and outputs, allowing self-correction of reasoning trajectories without an external reward signal.

Researchers validated SDFT using the open-weight Qwen 2.5 model on three enterprise skills: science Q&A, software tool use, and medical reasoning. SDFT learned new tasks more effectively. On the Science Q&A benchmark, SDFT achieved 70.2% accuracy, compared to 66.2% for SFT.

SDFT preserves original knowledge. When an SFT model learned the science task, its ability to answer general questions (e.g., logic, humanities) declined. The SDFT model improved on the science task while its “Previous Tasks” score remained stable at 64.5%. This suggests companies can specialize models for departments without degrading basic reasoning or common sense.

In a knowledge injection simulation using a dataset of fictional “2025 Natural Disasters,” an SFT model memorized facts but struggled with reasoning, while an SDFT model scored 98% on indirect reasoning questions by internalizing the logic.

In a sequential learning experiment, the SDFT model accumulated science, tool use, and medical skills without regression, unlike the standard model whose performance fluctuated. This capability eliminates the need for managing multiple models (model zoos) for different tasks, potentially reducing inference costs.

The code for SDFT is available on GitHub for integration into training workflows. Idan Shenfeld, a doctorate student at MIT and co-author of the paper, noted that the SDFT pipeline is similar to the RL pipeline, requiring online response generation during training. The team is integrating SDFT into Hugging Face’s Transformer Reinforcement Learning (TRL) library.

SDFT requires strong in-context learning capabilities in models, typically around 4 billion parameters with newer architectures like Qwen 3. Shenfeld anticipates 1 billion-parameter models will soon be sufficient. The technique demands approximately 2.5 times the compute of standard fine-tuning. SDFT is suited for organizations requiring a single model to accumulate multiple skills, especially where defining an RL reward function is difficult or impossible.

SDFT is approximately four times slower and requires 2.5 times more computational power (FLOPs) than standard fine-tuning because the model actively generates “rollouts” during training for comparison with the teacher. However, better knowledge retention may avoid costly multi-stage retraining often needed to address catastrophic forgetting. Smaller models (e.g., 3 billion parameters) initially struggled due to insufficient “intelligence” to act as teachers. Shenfeld noted that the Qwen 3 4B model is strong enough, and future 1B models may also have sufficient ICL capabilities.

The ultimate goal is to move beyond static models to systems that improve with use, leading to continuous improvement by harnessing inference compute.


Featured image credit

Tags: AImemory

Related Posts

Researchers create AI worm that adapts attacks without human input

Researchers create AI worm that adapts attacks without human input

June 4, 2026
Researchers unlock 20-fold enhancement in ultrafast laser experiments

Researchers unlock 20-fold enhancement in ultrafast laser experiments

June 3, 2026
NASA tests next-gen radiation-hardened space computer chip

NASA tests next-gen radiation-hardened space computer chip

May 29, 2026
Penn physicists use light-matter particles to boost AI chip speeds

Penn physicists use light-matter particles to boost AI chip speeds

May 29, 2026
Global AI spending to hit .59 trillion in 2026, says Gartner forecast

Global AI spending to hit $2.59 trillion in 2026, says Gartner forecast

May 28, 2026
New CHEEM framework helps AI learn new tasks without forgetting old ones

New CHEEM framework helps AI learn new tasks without forgetting old ones

May 27, 2026

LATEST NEWS

Amazon adds AI-generated product previews to search results

Meta launches AI business agents on WhatsApp, Instagram and Messenger

Nintendo will release a repair-friendly Switch 2 in Europe

Google rolls out Ask Gemini in Drive to eligible Workspace users

Google Wallet to add digital IDs from select EU countries this summer

Why Telegram Mini Apps have become the optimal ecosystem for launching AI SaaS products

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.