Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Instella is here: AMD’s 3B-parameter model takes on Llama and Gemma

Instella employs an autoregressive transformer architecture consisting of 36 decoder layers and 32 attention heads, enabling it to process lengthy sequences of up to 4,096 tokens

byKerem Gülen
March 7, 2025
in Artificial Intelligence, News
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

AMD has unveiled Instella, a family of fully open-source language models featuring 3 billion parameters, trained from scratch on AMD Instinct™ MI300X GPUs. Instella models outperform existing open models of similar sizes and compete effectively with leading open-weight models, including Llama-3.2-3B, Gemma-2-2B, and Qwen-2.5-3B, including their instruction-tuned versions.

AMD unveils Instella: Open-source language models outperforming rivals

Instella employs an autoregressive transformer architecture consisting of 36 decoder layers and 32 attention heads, enabling it to process lengthy sequences of up to 4,096 tokens. The model utilizes a vocabulary of approximately 50,000 tokens, managed by the OLMo tokenizer, making it adept at generating and interpreting text across various domains.

The training procedure for Instella highlights collaboration between AMD’s hardware and software innovations. This new model builds on the groundwork established by AMD’s previous 1-billion-parameter models, transitioning from training on 64 AMD Instinct MI250 GPUs with 1.3 trillion tokens to using 128 Instinct MI300X GPUs with 4.15 trillion tokens for the current 3-billion-parameter model.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

instella-is-here-amd-3b-parameter-model-takes-on-llama-and-gemma
Image: AMD

By comparing Instella to prior models, AMD reports that it not only surpasses existing fully open models but also achieves competitive performance against state-of-the-art open-weight models, marking a significant milestone in the natural language processing field. This initiative aligns with AMD’s commitment to making advanced technology more accessible and fostering collaboration and innovation within the AI community.


AMD RX 9000 pricing could make you rethink that RTX 5090 purchase


Instella model phases and training data

This release includes several versions of the Instella models, each representing different training stages:

Model Stage Training Data (Tokens) Description
Instella-3B-Stage1 Pre-training (Stage 1) 4.065 Trillion First stage pre-training to develop proficiency in natural language.
Instella-3B Pre-training (Stage 2) 57.575 Billion Second stage pre-training to enhance problem-solving capabilities.
Instella-3B-SFT SFT 8.902 Billion (x3 epochs) Supervised Fine-tuning (SFT) to enable instruction-following capabilities.
Instella-3B-Instruct DPO 760 Million Alignment to human preferences and enhancement of chat capabilities with direct preference optimization (DPO).

In the multi-stage training pipeline, the first pre-training stage used 4.065 trillion tokens from diverse datasets, establishing foundational language understanding. The subsequent training on an additional 57.575 billion tokens further enhanced the model’s performance across varied tasks and domains.

During supervised fine-tuning, Instella-3B-SFT was trained with 8.9 billion tokens, improving interactive response capabilities. The final stage, Instella-3B-Instruct, underwent alignment training with Direct Preference Optimization using 0.76 billion tokens, ensuring that the model’s outputs are aligned with human values and preferences.

AMD has made all artifacts associated with Instella models fully open-source, including model weights, training configurations, datasets, and code, fostering collaboration and innovation in the AI community. These resources can be accessed via Hugging Face model cards and GitHub repositories.


Featured image credit: AMD

Tags: AIAMD

Related Posts

“Free robots are an illusion”: Why we’ll pay for system intelligence, not delivery workers

“Free robots are an illusion”: Why we’ll pay for system intelligence, not delivery workers

June 12, 2026
How Henrique Schmaiske led Meteor.js through its biggest transformation

How Henrique Schmaiske led Meteor.js through its biggest transformation

June 12, 2026
Proven privacy: Why ‘no-log’ claims need real evidence today

Proven privacy: Why ‘no-log’ claims need real evidence today

June 12, 2026
ChatGPT hits 1 billion users as global AI adoption surges despite backlash

ChatGPT hits 1 billion users as global AI adoption surges despite backlash

June 12, 2026
Huawei launches HarmonyOS 7 developer beta with upgraded API 26

Huawei launches HarmonyOS 7 developer beta with upgraded API 26

June 12, 2026
OpenAI Codex referral program rewards users with extra rate resets

OpenAI Codex referral program rewards users with extra rate resets

June 12, 2026

LATEST NEWS

“Free robots are an illusion”: Why we’ll pay for system intelligence, not delivery workers

How Henrique Schmaiske led Meteor.js through its biggest transformation

Proven privacy: Why ‘no-log’ claims need real evidence today

ChatGPT hits 1 billion users as global AI adoption surges despite backlash

Huawei launches HarmonyOS 7 developer beta with upgraded API 26

OpenAI Codex referral program rewards users with extra rate resets

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Roboto AI

Pickaxe

Pfpmaker

MindPal

Syllaby

ScreenApp

FinanceBrain

GitHub Spark

Hints

VisionStory AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.