Evgeniy Tolstykh did not just ask an AI to draft a polite email or summarize a meeting in his recent research. He challenged it to build a AAA game production department from the ground up. In a head-to-head experiment, Tolstykh tasked a Large Language Model (LLM) with generating art roadmaps, Jira structures, and production estimates, and then measured the results against his human team’s output.
The findings were stark.
The AI collapsed planning timelines from weeks to hours, but it arrived with a caveat: a dangerous tendency toward “hallucinated” optimism. Tolstykh’s experiment offers a granular view of the friction between algorithmic speed and the chaotic reality of game development.
This shift isn’t happening in a vacuum. Recent industry data confirms that Tolstykh is at the vanguard of a massive structural pivot. According to a 2025 Google Cloud survey, 90% of game developers are now integrating generative AI into their workflows, and 95% report that the technology is actively reducing repetitive tasks.
Perhaps most notably for producers, 44% of developers now use AI to process information autonomously, enabling faster decision-making across the pipeline. This rapid adoption comes as 94% of respondents believe AI will ultimately be the key to reducing spiraling production costs.
The Mathematical Baseline for Creativity
One of the most striking results of the experiment was the AI’s proposal of a “BTSU” (BT-level shell unit) to standardize art complexity. While Tolstykh provided the initial concept of a “golden unit” for estimation, the AI independently formalized the math.
“The model’s key contribution was not the emergence of the idea itself, but its formalization,” Tolstykh says. “The AI independently derived the structure of the formula, justified the choice of parameters, and explained their logic without additional clarification.”
Crucially, Tolstykh withheld all project-specific data and internal estimates to test the model’s “out of the box” capabilities and respect strict NDA constraints. The AI relied on its internal statistical representations of how complex objects are decomposed across various domains to build a functional baseline.
“I treated the BTSU formula not as a discovered objective truth, but as a hypothesis generated by the model and subject to validation and calibration against real production data,” he explains.
Navigating the “Optimism Trap”
Initial AI estimates were 1.5 times lower than those of the human team. Even after refining the prompts with reference data, Tolstykh maintained a 15% contingency buffer, a staple of his production practice. He views this not as a fix for “AI optimism,” but as a necessary hedge against the inherent chaos of live development.
However, Tolstykh argues that the AI’s “clean room” perspective has its own value. “A clean, laboratory-like perspective from a vacuum can be an interesting lens for looking at your processes from the outside,” he says. It forces producers to ask what a project could look like if inevitable real-world frictions were removed.
The Human Element in an Agentic Future
As Tolstykh prepares to explore “Agentic AI” – systems designed to act with more autonomy – he acknowledges the existential dread felt by many project managers. Yet, he insists that the core of the producer role remains safe from automation.
“AI is not capable of replacing humans in what matters most: making decisions and taking responsibility under uncertainty,” Tolstykh says. “Conflict management, stakeholder communication, and balancing quality, deadlines, and team morale involve not only analysis, but also trust, context, and personal relationships built over time.”
He likens the future of production to aviation. Autopilots have existed for decades, but the captain still commands the aircraft. “Agentic AI can become a strong assistant in planning and forecasting, but the final call, especially when the stakes are high, and there is no single correct answer, still belongs to a human,” he adds.
When Hallucination Becomes Innovation
The AI occasionally produced unprompted suggestions, such as a “Titan Armor Production Template,” which Tolstykh’s team eventually validated as a solid strategy. This raises a difficult question: how do you distinguish a “bad guess” from a “brilliant hypothesis”?
“If an AI suggestion cannot fit into a real pipeline, does not hold up in practice, or has no clear internal logic, then it’s a bad guess,” Tolstykh explains. He mentors junior producers to focus on verification rather than blind trust. “Our main superpower is fact-checking; we must verify everything that is said.”
For a “hallucination” to become a feature, it must be understandable and testable. In Tolstykh’s view, AI is most useful when it regularly offers hypotheses that can be safely tested and rejected without breaking the system.
The Psychological Safety of Simple Tools
While the industry often chases expensive, enterprise-grade AI suites, Tolstykh advocates for using the simplest consumer-grade LLMs. He finds that these accessible tools lower the barrier to entry for skeptical veterans.
“Simple, accessible tools let people experiment without feeling risk or losing control,” Tolstykh says. “The focus shifts from ‘Should we implement this?’ to ‘How can this help me with my specific task?'”
By stripping away the “top-down” pressure of complex integrations, these tools feel psychologically safer. For the veteran developer, a tool that quietly helps with concrete tasks is far easier to accept than one promising a revolution.




