Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

DeepSeek AI introduces NSA: A faster approach to long-context modeling

Researchers have explored Sparse Attention—which selectively processes only the most important information instead of everything

byKerem Gülen
February 19, 2025
in Research
Home Research
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

Large language models (LLMs) are getting smarter, but they’re also hitting a wall: handling long pieces of text is slow and computationally expensive. Traditional attention mechanisms—the core of how AI processes and remembers information—struggle to scale efficiently, making models costly to train and run.

Now, researchers from DeepSeek-AI and Peking University have introduced a game-changing approach called Natively Sparse Attention (NSA). This new method promises to make AI models significantly faster, cheaper, and more efficient, all while maintaining the same level of reasoning capability as traditional approaches.

Why AI’s attention problem needs a fix

Imagine reading a book where you have to keep every sentence in mind at all times—that’s how Full Attention mechanisms work in AI. They scan and store information across long sequences, but as context length grows (think thousands of words), this approach becomes incredibly slow and computationally heavy.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

To address this, researchers have explored Sparse Attention—which selectively processes only the most important information instead of everything. However, existing sparse methods have major weaknesses:

  1. They’re hard to train from scratch, often requiring models to first learn with Full Attention before switching to a sparse approach.
  2. They don’t fully optimize for modern hardware, meaning theoretical speed improvements don’t always translate to real-world efficiency.
deepseek-ai-introduces-nsa-a-faster-approach-to-long-context-modeling
(Image credit)

How NSA changes the game

The team behind NSA, including Jingyang Yuan, Huazuo Gao, Damai Dai, and their colleagues, took a fresh approach. Their method natively integrates sparsity from the start, rather than applying it as an afterthought.

NSA achieves this with two key innovations:

  • Hardware-aligned efficiency: NSA is built to maximize GPU performance, avoiding memory bottlenecks and ensuring real-world speedups.
  • End-to-end trainability: Unlike previous sparse methods, NSA is fully trainable from scratch, reducing training costs without losing accuracy.

Speed and accuracy: The NSA Advantage

So, how does NSA stack up against traditional Full Attention models? According to the study, NSA achieves up to 11× speed improvements while still matching—or even outperforming—Full Attention on key benchmarks.

Some of the biggest wins include:

  • Faster processing: NSA speeds up AI’s ability to handle long documents, codebases, and multi-turn conversations.
  • Better reasoning: Despite being “sparse,” NSA models match or exceed Full Attention models in chain-of-thought reasoning tasks.
  • Lower costs: By reducing computation without sacrificing performance, NSA could make advanced AI more affordable to train and deploy.
deepseek-ai-introduces-nsa-a-faster-approach-to-long-context-modeling
(Image credit)

Existing sparse attention methods

Many existing sparse attention mechanisms attempt to reduce computational overhead by selectively pruning tokens or optimizing memory access. However, they often fall short in practical implementation, either because they introduce non-trainable components or fail to align with modern GPU architectures.

For example:

  • ClusterKV and MagicPIG rely on discrete clustering or hashing techniques, which disrupt gradient flow and hinder model training.
  • H2O and MInference apply sparsity only during specific stages of inference, limiting speed improvements across the full pipeline.
  • Quest and InfLLM use blockwise selection methods, but their heuristic-based scoring often results in lower recall rates.

NSA addresses these limitations by integrating sparsity natively—ensuring efficiency in both training and inference while preserving model accuracy. This means no post-hoc approximations or trade-offs between speed and reasoning capability.

NSA’s performance on real-world tasks

To validate NSA’s effectiveness, researchers tested it across a range of AI tasks, comparing its performance with traditional Full Attention models and state-of-the-art sparse attention methods. The results highlight NSA’s ability to match or surpass Full Attention models while significantly reducing computational costs.

deepseek-ai-introduces-nsa-a-faster-approach-to-long-context-modeling
(Image credit)

General benchmark performance

NSA demonstrated strong accuracy across knowledge, reasoning, and coding benchmarks, including:

  • MMLU & CMMLU: Matching Full Attention in knowledge-based tasks
  • GSM8K & MATH: Outperforming Full Attention in complex reasoning
  • HumanEval & MBPP: Delivering solid coding performance

Long-context understanding

NSA excels at handling long-context sequences in benchmarks like LongBench. In tasks requiring deep contextual memory, NSA maintained:

  • High recall in retrieval tasks (Needle-in-a-Haystack, document QA)
  • Stable accuracy in multi-hop reasoning (HPQ, 2Wiki, GovRpt)

Real-world speed gains

The hardware-aligned optimizations in NSA lead to:

  • 9× faster inference speeds for 64k-length sequences
  • 6× faster training efficiency compared to Full Attention models
  • Reduced memory bandwidth consumption, making large-scale AI applications more feasible
Tags: AIdeepseekFeatured

Related Posts

Researchers create AI worm that adapts attacks without human input

Researchers create AI worm that adapts attacks without human input

June 4, 2026
Researchers unlock 20-fold enhancement in ultrafast laser experiments

Researchers unlock 20-fold enhancement in ultrafast laser experiments

June 3, 2026
NASA tests next-gen radiation-hardened space computer chip

NASA tests next-gen radiation-hardened space computer chip

May 29, 2026
Penn physicists use light-matter particles to boost AI chip speeds

Penn physicists use light-matter particles to boost AI chip speeds

May 29, 2026
Global AI spending to hit .59 trillion in 2026, says Gartner forecast

Global AI spending to hit $2.59 trillion in 2026, says Gartner forecast

May 28, 2026
New CHEEM framework helps AI learn new tasks without forgetting old ones

New CHEEM framework helps AI learn new tasks without forgetting old ones

May 27, 2026

LATEST NEWS

Amazon adds AI-generated product previews to search results

Meta launches AI business agents on WhatsApp, Instagram and Messenger

Nintendo will release a repair-friendly Switch 2 in Europe

Google rolls out Ask Gemini in Drive to eligible Workspace users

Google Wallet to add digital IDs from select EU countries this summer

Why Telegram Mini Apps have become the optimal ecosystem for launching AI SaaS products

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.