Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

NVIDIA’s nGPT model cuts AI training time by 20x

byKerem Gülen
October 21, 2024
in Artificial Intelligence
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

NVIDIA has unveiled a major advancement in AI model training with the launch of the Normalized Transformer (nGPT). This new architecture, designed to enhance the training process for large language models (LLMs), has the potential to speed up training times by 4 to 20 times, all while maintaining model stability and accuracy. The nGPT model streamlines the training process, using fewer resources and offering a more efficient solution to AI development.

What makes nGPT different: Hyperspherical learning

At the core of nGPT’s efficiency is a concept called hyperspherical representation learning. In traditional transformer models, data is often processed without a consistent geometric framework. NVIDIA’s nGPT changes this by mapping all key components—such as embeddings, attention matrices, and hidden states—onto the surface of a hypersphere. This geometric setup helps ensure that all layers of the model remain balanced during training, creating a more stable and efficient learning process.

This approach reduces the number of training steps significantly. Rather than applying weight decay directly to model weights like previous models, nGPT relies on learned scaling parameters, which optimize how the model adjusts during training. Importantly, this method eliminates the need for other normalization techniques like LayerNorm or RMSNorm, making the process both simpler and faster.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

NVIDIA’s nGPT model cuts AI training time by 20x
NVIDIA’s nGPT model cuts AI training time by 20x (Image credit)

Faster training with fewer resources

The results of nGPT’s architecture are clear. In tests conducted using the OpenWebText dataset, NVIDIA’s nGPT consistently outperformed traditional GPT models in terms of both speed and efficiency. With text inputs as long as 4,000 tokens, nGPT required far fewer training rounds to achieve similar validation loss, drastically cutting down the time it takes to train these complex models.

Additionally, nGPT’s hyperspherical structure provides better embedding separability. This means the model can more easily distinguish between different inputs, leading to improved accuracy during standard AI tests. The improved generalization of the model also enables it to perform better on tasks beyond its initial training, speeding up convergence while maintaining high levels of precision.

NVIDIA’s nGPT model cuts AI training time by 20x
NVIDIA’s nGPT model cuts AI training time by 20x (Image credit)

Why this matters for AI training

A key advantage of nGPT is its ability to combine both normalization and representation learning into one unified framework. This design simplifies the model’s architecture, making it easier to scale and adapt for more complex hybrid systems. This could potentially lead to the development of even more powerful AI systems in the future, as nGPT’s approach could be integrated into other types of models and architectures.


Featured image credit: Kerem Gülen/Ideogram

Tags: AIFeaturedNvidia

Related Posts

ChatGPT hits 1 billion users as global AI adoption surges despite backlash

ChatGPT hits 1 billion users as global AI adoption surges despite backlash

June 12, 2026
OpenAI Codex referral program rewards users with extra rate resets

OpenAI Codex referral program rewards users with extra rate resets

June 12, 2026
Zuckerberg says small elite teams can drive major AI breakthroughs

Zuckerberg says small elite teams can drive major AI breakthroughs

June 12, 2026
Google says AI Overviews reach 2.5 billion monthly users

Google says AI Overviews reach 2.5 billion monthly users

June 12, 2026
Anthropic apologizes for hidden Fable throttling, pledges transparency

Anthropic apologizes for hidden Fable throttling, pledges transparency

June 11, 2026
Reco builds momentum to secure the enterprise AI agent sprawl

Reco builds momentum to secure the enterprise AI agent sprawl

June 11, 2026

LATEST NEWS

“Free robots are an illusion”: Why we’ll pay for system intelligence, not delivery workers

How Henrique Schmaiske led Meteor.js through its biggest transformation

Proven privacy: Why ‘no-log’ claims need real evidence today

ChatGPT hits 1 billion users as global AI adoption surges despite backlash

Huawei launches HarmonyOS 7 developer beta with upgraded API 26

OpenAI Codex referral program rewards users with extra rate resets

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.