Anthropic has not published any technical paper on Claude Mythos, prompting speculative theories within the research community. Kye Gomez released an open-source project named OpenMythos on GitHub, which aims to theoretically reconstruct the Claude Mythos architecture using first principles and peer-reviewed research.
OpenMythos is neither a leaked model nor a model distillation; instead, it serves as a falsifiable hypothesis represented in code. The core assertion is that Claude Mythos fits into the category of Recurrent-Depth Transformers (RDTs), also known as Looped Transformers, which differ from standard transformer architectures.
In standard transformers, models utilize a sequence of unique layers with independent weights to process inputs. In contrast, RDTs apply a fixed set of weights iteratively across multiple loop steps during a single forward pass. This approach enables models to improve their internal representations through repeated computations instead of primarily relying on the number of parameters.
Introducing OpenMythos
An open-source, first-principles theoretical reconstruction of Claude Mythos, implemented in PyTorch.
The architecture instantiates a looped transformer with a Mixture-of-Experts (MoE) routing mechanism, enabling iterative depth via weight sharing and… pic.twitter.com/YLvCid6CAr
— Kye Gomez (swarms) (@KyeGomezB) April 19, 2026
The OpenMythos architecture comprises three parts: Prelude, Recurrent Block, and Coda. Both Prelude and Coda are standard transformer layers, executed once, while the Recurrent Block functions repeatedly up to 16 times. At each loop iteration, the hidden state is updated based on learned matrices and encoded inputs.
The Recurrent Block employs a Mixture-of-Experts (MoE) layer, utilizing a sparse activation method to optimize performance. Attention mechanisms are executed using Multi-Latent Attention, which reduces memory usage significantly by caching compressed key-value latents instead of full tensors.
OpenMythos represents a shift in reasoning approach, operating entirely in a continuous latent space. This means that reasoning occurs without generating intermediate tokens, contrasting with traditional chain-of-thought prompting. Each loop iteration in an RDT correlates to a single reasoning step but functions over continuous vectors, allowing for simultaneous encoding of multiple possible outcomes.
A key advantage of RDTs is their capability in extending reasoning depth through additional inference-time loops, unlike standard transformers that struggle with longer reasoning chains. OpenMythos addresses historical stability issues associated with looped models using a Linear Time-Invariant constraint that ensures stable hidden states across iterations.
Furthermore, OpenMythos tackles overthinking by incorporating Adaptive Computation Time (ACT), which dynamically determines when to halt processing based on the complexity of the task. Depth-wise LoRA adapters are also integrated to allow for behavioral differentiation at each iteration without significantly increasing the total parameters.
The work presents empirical evidence supporting the parameter efficiency of RDTs. According to the Parcae paper, an RDT with 770 million parameters can achieve results comparable to a 1.3 billion parameter standard transformer trained on identical datasets, suggesting a new paradigm for AI development focusing on reasoning depth versus sheer parameter count.
OpenMythos contributes four primary research artifacts: a configurable PyTorch implementation of the RDT hypothesis, LTI-stable recurrent injection, depth-wise LoRA adapters, and a research baseline for exploring looped transformer dynamics. This project offers a tangible implementation for the research community to explore an architectural class that may represent a key advancement in AI capabilities.





