Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

LLM Colosseum pushes AI limits with Street Fighter III duels

byEray Eliaçık
April 8, 2024
in Artificial Intelligence

Picture a digital arena where Large Language Models (LLMs) step out of their text-based comfort zone and into the electrifying world of Street Fighter III. That’s the essence of the LLM Colosseum—a clever way to benchmark LLMs.

What’s the idea?

The LLM Colosseum was conceived with a simple yet groundbreaking idea: to push the boundaries of AI beyond conventional tasks. By inviting LLMs to duke it out in Street Fighter III, they sought to explore their adaptability and strategic prowess in a dynamic gaming environment.

Introducing LLM Colosseum ! 🔥

Evaluate LLMs quality by having them fight in realtime in Street Fighter III !

Who is the best ? @OpenAI or @MistralAI ?

Let them fight ! Open source code and ranking 👇 pic.twitter.com/GF6HOkVHIA

— Stan Girard (@_StanGirard) March 24, 2024

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Behind the scenes, the Colosseum harnesses the power of emulators and APIs to recreate the fast-paced action of Street Fighter III. LLMs are tasked with controlling characters like Ken or Ryu, using their language processing abilities to make split-second decisions and execute moves within the game.

How do they play?

In the LLM Colosseum, every player is represented by an LLM, an advanced AI model capable of processing and responding to text descriptions of the game screen. This agent-based approach allows each LLM to autonomously decide its character’s next moves based on various factors such as its previous actions, the moves of its opponents, as well as its own power and health status.

To ensure smooth and responsive gameplay, the system employs multithreading technology. This means that the game engine can handle multiple processes simultaneously, allowing for real-time interactions between the LLMs and the game environment. As a result, players can experience the thrill of dynamic battles without any noticeable delay.

With this combination of agent-based control, multithreading, and real-time processing, the LLM Colosseum delivers an immersive gaming experience where AI entities engage in fast-paced combat, showcasing their decision-making skills and adaptability in the heat of battle.

LLM Colosseum pushes AI limits with Street Fighter III duels
LLMs participating in the Colosseum control characters like Ken or Ryu, making split-second decisions based on text descriptions of the game screen  (Image credit)

As the virtual fighters take their positions, LLMs analyze the game state and craft their moves based on contextual prompts. Whether it’s launching a devastating super move or timing a precise counter-attack, each decision reflects the AI’s understanding of the game mechanics and its strategic approach to victory.

Who won?

In the Street Fighter III battles at the LLM Colosseum, there wasn’t one clear winner. Instead, various models like claude_3_haiku, claude_3_sonnet, and claude_2 stood out on the leaderboard. These models showed their strength in the virtual ring, but there wasn’t a single champion. The competition was more about understanding how different AI models perform in gaming scenarios. Each match gave us insights into how these models think and make decisions in dynamic situations, making the event an exciting exploration of AI capabilities.

LLM Colosseum pushes AI limits with Street Fighter III duels
The LLM Colosseum introduces a groundbreaking approach to benchmarking Large Language Models (LLMs) by immersing them in real-time gameplay, notably featuring Street Fighter III battles  (Image credit)

Observing LLMs in the Street Fighter III arena has yielded fascinating insights into their capabilities and behaviors. From adaptive strategies to unexpected tactics, these AI combatants have demonstrated a remarkable ability to navigate the complexities of real-time gameplay, showcasing their potential beyond traditional AI tasks.

You can join the LLM Colosseum

If you’re eager to get involved and run the benchmark yourself, all the necessary code and documentation are available on GitHub. This means you have the opportunity to customize prompts, introduce new LLM contenders, and delve deeper into model behaviors.

Whether you’re a gaming enthusiast or an AI aficionado, the LLM Colosseum offers a front-row seat to the action-packed world of Street Fighter III battles. Witness the clash of digital titans or even step into the arena yourself to explore the intersection of AI and gaming in this thrilling experiment.

So, grab your popcorn and prepare for an adrenaline-fueled journey where AI meets arcade classics in the ultimate battle for supremacy!


Featured image credit: Stan Girard

Tags: AIBenchmark

Related Posts

Google releases Gemini 2.5 Computer Use model for building UI agents

Google releases Gemini 2.5 Computer Use model for building UI agents

October 8, 2025
AI is now the number one channel for data exfiltration in the enterprise

AI is now the number one channel for data exfiltration in the enterprise

October 8, 2025
Google expands its AI vibe-coding app Opal to 15 more countries

Google expands its AI vibe-coding app Opal to 15 more countries

October 8, 2025
Google introduces CodeMender, an AI agent for code security

Google introduces CodeMender, an AI agent for code security

October 8, 2025
The global race for AI talent: Why immigration policy will define the next decade of innovation

The global race for AI talent: Why immigration policy will define the next decade of innovation

October 8, 2025
ChatGPT reaches 800m weekly active users

ChatGPT reaches 800m weekly active users

October 7, 2025

LATEST NEWS

Microsoft delays Xbox Game Pass price increase for some existing subscribers

Google releases Gemini 2.5 Computer Use model for building UI agents

AI is now the number one channel for data exfiltration in the enterprise

Google expands its AI vibe-coding app Opal to 15 more countries

Google introduces CodeMender, an AI agent for code security

Megabonk once again proves you don’t need fancy graphics to become a hit

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.