Picture a digital arena where Large Language Models (LLMs) step out of their text-based comfort zone and into the electrifying world of Street Fighter III. That’s the essence of the LLM Colosseum—a clever way to benchmark LLMs.
What’s the idea?
The LLM Colosseum was conceived with a simple yet groundbreaking idea: to push the boundaries of AI beyond conventional tasks. By inviting LLMs to duke it out in Street Fighter III, they sought to explore their adaptability and strategic prowess in a dynamic gaming environment.
Introducing LLM Colosseum ! 🔥
Evaluate LLMs quality by having them fight in realtime in Street Fighter III !
Who is the best ? @OpenAI or @MistralAI ?
Let them fight ! Open source code and ranking 👇 pic.twitter.com/GF6HOkVHIA
— Stan Girard (@_StanGirard) March 24, 2024
Behind the scenes, the Colosseum harnesses the power of emulators and APIs to recreate the fast-paced action of Street Fighter III. LLMs are tasked with controlling characters like Ken or Ryu, using their language processing abilities to make split-second decisions and execute moves within the game.
How do they play?
In the LLM Colosseum, every player is represented by an LLM, an advanced AI model capable of processing and responding to text descriptions of the game screen. This agent-based approach allows each LLM to autonomously decide its character’s next moves based on various factors such as its previous actions, the moves of its opponents, as well as its own power and health status.
To ensure smooth and responsive gameplay, the system employs multithreading technology. This means that the game engine can handle multiple processes simultaneously, allowing for real-time interactions between the LLMs and the game environment. As a result, players can experience the thrill of dynamic battles without any noticeable delay.
With this combination of agent-based control, multithreading, and real-time processing, the LLM Colosseum delivers an immersive gaming experience where AI entities engage in fast-paced combat, showcasing their decision-making skills and adaptability in the heat of battle.
As the virtual fighters take their positions, LLMs analyze the game state and craft their moves based on contextual prompts. Whether it’s launching a devastating super move or timing a precise counter-attack, each decision reflects the AI’s understanding of the game mechanics and its strategic approach to victory.
Who won?
In the Street Fighter III battles at the LLM Colosseum, there wasn’t one clear winner. Instead, various models like claude_3_haiku, claude_3_sonnet, and claude_2 stood out on the leaderboard. These models showed their strength in the virtual ring, but there wasn’t a single champion. The competition was more about understanding how different AI models perform in gaming scenarios. Each match gave us insights into how these models think and make decisions in dynamic situations, making the event an exciting exploration of AI capabilities.
Observing LLMs in the Street Fighter III arena has yielded fascinating insights into their capabilities and behaviors. From adaptive strategies to unexpected tactics, these AI combatants have demonstrated a remarkable ability to navigate the complexities of real-time gameplay, showcasing their potential beyond traditional AI tasks.
You can join the LLM Colosseum
If you’re eager to get involved and run the benchmark yourself, all the necessary code and documentation are available on GitHub. This means you have the opportunity to customize prompts, introduce new LLM contenders, and delve deeper into model behaviors.
Whether you’re a gaming enthusiast or an AI aficionado, the LLM Colosseum offers a front-row seat to the action-packed world of Street Fighter III battles. Witness the clash of digital titans or even step into the arena yourself to explore the intersection of AI and gaming in this thrilling experiment.
So, grab your popcorn and prepare for an adrenaline-fueled journey where AI meets arcade classics in the ultimate battle for supremacy!
Featured image credit: Stan Girard