DeepSeek Reveals MODEL1 Architecture In GitHub Update Ahead Of V4

DeepSeek revealed details of a new model designated “MODEL1” through recent updates to its FlashMLA codebase on GitHub. The identifier “MODEL1” appears 28 times across 114 files within the repository, marking the disclosure on the one-year anniversary of the company’s R1 release. This development follows reports that DeepSeek plans to release its next-generation V4 model around mid-February 2026, coinciding with the Lunar New Year.

Analysis of the updated codebase by developers indicates MODEL1 features a distinct architecture from DeepSeek-V3.2, codenamed “V32” in the repository. Code logic discrepancies suggest changes in key-value cache layout, sparsity handling, and FP8 data format decoding, pointing to restructuring for memory optimization and computational efficiency. Researchers on Reddit’s LocalLLaMA community noted the FlashMLA source code update added extensive MODEL1 support, including compatibility with Nvidia’s forthcoming Blackwell architecture (SM100) and current Hopper chips. The changes reportedly show MODEL1 reverting to a unified 512-standard dimension and introducing “Value Vector Position Awareness” features, alongside potential implementations of DeepSeek’s recently published “Engram” conditional memory system.

The FlashMLA repository, which houses DeepSeek’s Multi-Head Latent Attention decoding kernel optimized for Nvidia Hopper GPUs, was the source of the technical clues. DeepSeek’s V4 model is expected to integrate the Engram architecture, which facilitates efficient retrieval from contexts exceeding one million tokens by utilizing a lookup system for foundational facts rather than recalculating them through computation. Internal tests by DeepSeek employees reportedly suggest V4 could outperform rival models from Anthropic and OpenAI on coding benchmarks, particularly with long code prompts.

The MODEL1 revelation occurs as DeepSeek approaches one year since its R1 debut in January 2025. The R1 release resulted in a $593 billion reduction in Nvidia’s market value on a single day, according to ITPro. DeepSeek’s R1 model reportedly cost under $6 million to train and achieved performance on par with or exceeding OpenAI’s o1 model on math and coding benchmarks. The company subsequently released V3.1 in August and V3.2 in December, with V3.2 described as offering performance equivalent to OpenAI’s GPT-5. DeepSeek has not officially commented on MODEL1 or confirmed specific release timing for V4.

Featured image credit