A new open-source model named DeepSeek-OCR has been released, disrupting the traditional paradigm of large models. The model, which was open-sourced yesterday afternoon, has seen a meteoric rise in the AI community, gaining over 4,000 stars on GitHub overnight. The core focus of DeepSeek-OCR is a novel visual approach to handling text, which promises to solve one of the biggest challenges in AI: long-context efficiency.
How DeepSeek-OCR changes the game
The new DeepSeek-OCR model is not just another text-reading tool. Its power lies in its ability to compress information. According to its creators, the model can take a 1,000-word article and compress it into just 100 visual tokens. This represents a staggering tenfold compression ratio with 97% accuracy. This efficiency is remarkable; a single NVIDIA A100 GPU can process 200,000 pages of data per day using the DeepSeek-OCR method. This new processing approach could signal a significant shift in the input methods used for large models.
The rapid traction of DeepSeek-OCR was amplified by high-profile endorsements. Andrej Karpathy, the co-founder of OpenAI and former Director of Autopilot at Tesla, shared his excitement about the paper. He called DeepSeek-OCR a “good OCR model” and highlighted its more “interesting part”: the concept of a computer vision AI “masquerading as a natural language person.”
I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter.
The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language… https://t.co/AxRXBdoO0F
— Andrej Karpathy (@karpathy) October 20, 2025
Karpathy believes this visual-first method is a superior input for large language models. He proposed that LLMs should use images as their primary input, and even when processing plain text, they should render it into an image first. In his view, this would lead to much higher information compression and a more generalized information flow.
Karpathy also emphasized that the DeepSeek-OCR approach could solve issues with traditional “word segmenters,” or tokenizers. He argued that word segmenters are “ugly and standalone,” introduce Unicode and byte encoding issues, and can even increase security risks.
He views OCR as just one of many visual-text tasks, suggesting that text-to-text tasks could be converted to visual-text tasks, but not the other way around. This sentiment was echoed by Xie Saining, an assistant professor at New York University, who agreed with Karpathy’s views on integrating computer vision and natural language processing.
How to access DeepSeek-OCR
The DeepSeek-OCR model is available as an open-source project on GitHub and Hugging Face under the name deepseek-ai/DeepSeek-OCR. The model, which has 3 billion parameters, is available for download and use with the Hugging Face transformers library. The creators have provided code examples for inference on NVIDIA GPUs, and the repository also includes guidance for PDF processing and model acceleration using vLLM.





