Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Nvidia hits 200 teraFLOP emulated FP64 for scientific computing

The AI accelerator achieves 200 teraFLOPS of double-precision matrix compute by repurposing Tensor cores for HPC workloads.

byKerem Gülen
January 19, 2026
in Tech, News
Home News Tech
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail

Nvidia is employing software emulation to enhance double-precision floating-point computation (FP64) performance in its AI accelerators for high-performance computing (HPC) and scientific applications, according to The Register. This strategy comes as the company unveils its Rubin GPUs, which deliver 33 teraFLOPS of peak FP64 performance, a decrease of 1 teraFLOP from the H100 GPU.

Nvidia’s CUDA libraries can achieve up to 200 teraFLOPS of FP64 matrix performance through software emulation, representing a 4.4x increase over the Blackwell accelerators’ hardware capabilities. Dan Ernst, Nvidia’s senior director of supercomputing products, stated the accuracy of emulation matches or exceeds that of tensor core hardware. However, Nicholas Malaya, an AMD fellow, questioned the efficacy of emulated FP64 in physical scientific simulations compared to benchmarks.

FP64 remains critical for scientific computing due to its dynamic range, capable of expressing over 18.44 quintillion unique values, in contrast to FP8’s 256 unique values used in AI models. HPC simulations, unlike AI workloads, require high precision to prevent error propagation that can lead to system instability, according to Malaya.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

The concept of using lower-precision data types to emulate FP64 dates back to the mid-20th century. In early 2024, researchers from the Tokyo and Shibaura institutes of technology published a paper demonstrating that FP64 matrix operations could be decomposed into multiple INT8 operations on Nvidia’s tensor cores, achieving higher-than-native performance. This method, known as the Ozaki scheme, forms the basis for Nvidia’s FP64 emulation libraries, released late last year. Ernst confirmed the emulated computation maintains FP64 precision, differing only in its hardware execution method.

Modern GPUs feature low-precision tensor cores, such as those in Rubin, which offer 35 petaFLOPS of dense FP4 compute. These cores are over 1,000x faster than FP64-specific components. Ernst explained that the efficiency of these low-precision cores led to exploring their use for FP64 emulation, aligning with the historical trend in supercomputing of leveraging available hardware.

AMD has expressed reservations regarding the accuracy of FP64 emulation. Malaya noted that the approach performs well for well-conditioned numerical systems, such as High Performance Linpack (HPL) benchmarks, but can falter in less-conditioned systems found in material science or combustion codes. He also highlighted that Nvidia’s algorithms for FP64 emulation are not fully IEEE compliant, failing to account for nuances such as positive versus negative zeros or “not a number” errors. These discrepancies can lead to small errors propagating and affecting final results. Malaya added that the Ozaki scheme approximately doubles memory consumption for FP64 matrices. AMD’s upcoming MI430X will specifically enhance double and single-precision hardware performance using its chiplet architecture.

Ernst acknowledged some limitations but contended that issues like positive/negative zeros are not critical for most HPC practitioners. Nvidia has developed supplemental algorithms to detect and mitigate issues like non-numbers and infinite numbers. He stated that increased memory overhead is relative to the operation, not the entire application, with typical matrices being a few gigabytes. Ernst argued that IEEE compliance issues often do not arise in matrix multiplication cases, especially in DGEMM operations.

Emulation primarily benefits a subset of HPC applications relying on dense general matrix multiply (DGEMM) operations. Malaya estimated that 60% to 70% of HPC workloads, particularly those relying on vector FMA, see little to no benefit from emulation. For vector-heavy workloads like computational fluid dynamics, Nvidia’s Rubin GPUs must use slower FP64 vector accelerators within their CUDA cores. Ernst countered that theoretical FLOPS do not always translate to usable performance, particularly when memory bandwidth acts as a bottleneck. Rubin, with 22 TB/s of HBM4 memory, is expected to deliver higher real-world performance in these workloads despite slower vector FP64 performance.

The viability of FP64 emulation will be tested as new supercomputers incorporating Nvidia’s Blackwell and Rubin GPUs become operational. The algorithms can improve over time given their software-based nature. Malaya indicated that AMD is also exploring FP64 emulation on chips like the MI355X via software flags. He emphasized that IEEE compliance would validate the approach by guaranteeing result consistency with dedicated silicon. Malaya suggested that the community should establish a suite of applications to evaluate the reliability of emulation across different use cases.


Featured image credit

Tags: Nvidia

Related Posts

Walmart maintains Apple Pay ban in U.S. stores for 2026

Walmart maintains Apple Pay ban in U.S. stores for 2026

January 19, 2026
iOS 27: Everything we know so far

iOS 27: Everything we know so far

January 19, 2026
Google Wallet and Tasks integrations surface in new Pixel 10 leak

Google Wallet and Tasks integrations surface in new Pixel 10 leak

January 19, 2026
Threads hits 141 million daily users to claim the mobile throne from X

Threads hits 141 million daily users to claim the mobile throne from X

January 19, 2026
Microsoft pushes emergency OOB update to fix Windows 11 restart loop

Microsoft pushes emergency OOB update to fix Windows 11 restart loop

January 19, 2026
Is Twitter down? Users report access issues as X won’t open

Is Twitter down? Users report access issues as X won’t open

January 16, 2026

LATEST NEWS

Nvidia hits 200 teraFLOP emulated FP64 for scientific computing

Walmart maintains Apple Pay ban in U.S. stores for 2026

iOS 27: Everything we know so far

Google Wallet and Tasks integrations surface in new Pixel 10 leak

Threads hits 141 million daily users to claim the mobile throne from X

Microsoft pushes emergency OOB update to fix Windows 11 restart loop

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.