Llama 3.1 Nemotron Instruct 70B

← AI Models
NVIDIA
2024-10-15
Modality:
Intelligence
13.4
#386/532
Coding
10.8
#399/88
Math
11
#230/265
Speed
32 tok/s
TTFT: 567.00s
Pricing
$1.20 / $1.20
per 1M tokens (in/out)
Google Preferred Source

Llama 3.1 Nemotron Instruct 70B is NVIDIA’s latest model designed for advanced instruction-following tasks. It processes at 32.39 tokens per second and is priced at $1.2 per million tokens, targeting professional users in AI and machine learning.

When to Use Llama 3.1 Nemotron Instruct 70B

✓ Best For

  • Instruction-based applications
  • Coding assistance
  • Mathematical problem solving

✗ Not Ideal For

  • Users requiring extremely high-speed processing
  • Basic conversational tasks

How Llama 3.1 Nemotron Instruct 70B Compares

Intelligence Index · Higher is better

CohereAmazonNVIDIAxAIAlibaba

Benchmark Profile

Output Speed · tok/s

OpenAINVIDIAAlibaba

Math Index

GoogleNous ResearchNVIDIAAI21 Labs

Intelligence · Coding · Math

Intelligence Coding Math

All Benchmark Scores (15)

BenchmarkScore
Intelligence Index 13.4
Coding Index 10.8
Math Index 11
MMLU-Pro 69%
GPQA 465%
LiveCodeBench 169%
HLE 46%
SciCode 23.3%
IFBench 30.7%
LCR 7%
TerminalBench Hard 4.5%
Tau2 23.1%
AIME 24.7%
AIME 2025 11%
MATH 500 73.3%

Data: Artificial Analysis · Updated: March 25, 2026

Frequently Asked Questions (15)

When was Llama 3.1 Nemotron Instruct 70B released?
Llama 3.1 Nemotron Instruct 70B was released on October 15, 2024.
Who created Llama 3.1 Nemotron Instruct 70B?
Llama 3.1 Nemotron Instruct 70B was created by NVIDIA.
How intelligent is Llama 3.1 Nemotron Instruct 70B?
Llama 3.1 Nemotron Instruct 70B scores 13 on the Artificial Analysis Intelligence Index, placing it above average among other open weight non-reasoning models of similar size (median: 13).
How fast is Llama 3.1 Nemotron Instruct 70B?
Llama 3.1 Nemotron Instruct 70B generates output at 31.8 tokens per second (based on the median across providers serving the model), which is at the lower end compared to other open weight non-reasoning models of similar size (median: 62.4 t/s).
What is the latency of Llama 3.1 Nemotron Instruct 70B?
Llama 3.1 Nemotron Instruct 70B has a time to first token (TTFT) of 2.08s (based on the median across providers serving the model), which is somewhat higher than average compared to other open weight non-reasoning models of similar size (median: 1.47s).
How much does Llama 3.1 Nemotron Instruct 70B cost?
Llama 3.1 Nemotron Instruct 70B costs $1.20 per 1M input tokens (somewhat higher than average, median: $0.52) and $1.20 per 1M output tokens (somewhat higher than average, median: $0.81), based on the median across providers serving the model.
What is Llama 3.1 Nemotron Instruct 70B API pricing?
Llama 3.1 Nemotron Instruct 70B costs $1.20 per 1M input tokens and $1.20 per 1M output tokens (based on the median across providers serving the model). For a blended rate (3:1 input to output ratio), this is $1.20 per 1M tokens. Pricing may vary by provider.
How verbose is Llama 3.1 Nemotron Instruct 70B?
When evaluated on the Intelligence Index, Llama 3.1 Nemotron Instruct 70B generated 3.8M output tokens, which is better than average compared to other open weight non-reasoning models of similar size (median: 3.8M).
Is Llama 3.1 Nemotron Instruct 70B a reasoning model?
No, Llama 3.1 Nemotron Instruct 70B is not a reasoning model. It provides direct responses without extended chain-of-thought reasoning.
What input modalities does Llama 3.1 Nemotron Instruct 70B support?
Llama 3.1 Nemotron Instruct 70B supports text only input.
What output modalities does Llama 3.1 Nemotron Instruct 70B support?
Llama 3.1 Nemotron Instruct 70B supports text only output.
Can Llama 3.1 Nemotron Instruct 70B process images?
No, Llama 3.1 Nemotron Instruct 70B does not support image input. It can only process text.
Is Llama 3.1 Nemotron Instruct 70B multimodal?
No, Llama 3.1 Nemotron Instruct 70B is not multimodal. It only supports text only input.
What is the context window of Llama 3.1 Nemotron Instruct 70B?
Llama 3.1 Nemotron Instruct 70B has a context window of 130k tokens. This determines how much text and conversation history the model can process in a single request.
Is Llama 3.1 Nemotron Instruct 70B open source?
Yes, Llama 3.1 Nemotron Instruct 70B is open weights. The model weights are publicly available and can be downloaded for self-hosting.