Large Language Models (LLMs) have transformed the way AI tackles reasoning problems, from answering tricky math questions to making sense of ambiguous human language. But there’s a catch—these models often struggle when reasoning gets too complex. A single AI can get stuck in local decision traps, missing better solutions simply because it doesn’t know what it doesn’t know.
A team of researchers from The Chinese University of Hong Kong and Shanghai AI Laboratory, led by Sen Yang, Yafu Li, Wai Lam, and Yu Cheng, propose a solution: Mixture-of-Search-Agents (MOSA). This method allows multiple AI models to work together, leveraging their combined strengths to navigate complex reasoning problems. Instead of relying on just one model’s perspective, MOSA enables different AI agents to explore various reasoning paths and refine each other’s answers.
Their findings, presented in the study “Multi-LLM Collaborative Search for Complex Problem Solving,” show that this approach significantly improves AI accuracy in math and commonsense reasoning tasks.
Why do AI models struggle with complex reasoning?
At its core, reasoning involves breaking a problem into smaller steps and exploring different paths to find the best solution. Traditional search-based approaches, such as breadth-first search (BFS) or depth-first search (DFS), help AI navigate these paths systematically. But even with advanced techniques like Chain-of-Thought (CoT) reasoning, where models break down their thought process step by step, a single LLM can still run into limitations:
- Limited exploration: AI models tend to get stuck in familiar reasoning patterns, failing to explore alternative solutions.
- Ambiguity in language: Natural language is inherently vague, making it difficult for an AI to evaluate all possible interpretations correctly.
- Trade-off between diversity and accuracy: Adjusting an AI’s temperature (how randomly it generates answers) helps introduce variety, but it often comes at the cost of precision.
MOSA aims to fix these problems by assembling multiple AI models to collaborate on reasoning tasks, ensuring broader exploration while maintaining accuracy.
How does MOSA work?
MOSA builds on a well-known search technique called Monte Carlo Tree Search (MCTS), commonly used in AI game-playing strategies. In a typical MCTS setup, an AI explores different possible moves, learning from past results to improve its decision-making. MOSA enhances this process by integrating multiple LLMs into the search, each acting as an independent reasoning agent.
Here’s how MOSA orchestrates the collaboration:
- Diverse search exploration: Each AI agent proposes different possible reasoning paths, increasing the diversity of search directions.
- Step-by-step refinement: AI agents analyze and refine each other’s reasoning steps, reducing errors.
- Aggregated decision-making: Instead of relying on a single AI’s output, MOSA aggregates the best contributions from multiple models, ensuring more reliable conclusions.
By using multiple models with different training data and strengths, MOSA prevents any single AI from dominating the decision process, avoiding local optimization traps.
How MOSA beats single AI models
To test MOSA’s effectiveness, the researchers conducted experiments across four well-known reasoning benchmarks:
- GSM8K (grade-school math word problems)
- SVAMP (math reasoning with variation in language structures)
- MATH-500 (a challenging dataset for advanced math problems)
- StrategyQA (commonsense reasoning questions)
The results were clear: MOSA consistently outperformed both single-agent AI models and existing multi-agent baselines.
- In MATH-500, one of the toughest datasets, MOSA improved accuracy by 1.8% over previous best methods.
- When integrating multiple LLMs, MOSA showed a 1.71% improvement in overall reasoning accuracy compared to traditional single-model search.
- The more diverse the AI team, the better the results—adding more LLMs further boosted performance.
The research highlights an important trend: AI collaboration is often more effective than AI competition. Just as humans work in teams to solve complex problems, AI models can complement each other’s strengths when working together. This has profound implications for fields that require deep reasoning, including:
- Automated scientific discovery: AI collaborations could accelerate breakthroughs in materials science, drug discovery, and physics.
- Advanced tutoring systems: MOSA-like approaches could make AI-powered learning assistants more accurate and helpful.
- Legal and financial analysis: Multi-agent AI could enhance contract analysis, financial forecasting, and risk assessment by cross-checking reasoning pathways.
Can AI defend against its own mistakes?
One of MOSA’s most promising aspects is its ability to catch and correct errors. Single AI models often generate mistakes confidently, making them hard to detect. But with multiple AI agents reviewing each other’s work, errors become less likely to go unnoticed. The research team also introduced a neural aggregator, an AI function that merges the best aspects of different reasoning paths into a more refined final answer.
Featured image credit: Anderson Rian/Unsplash