Implicit Caching Aims To Slash Gemini API Costs By 75%

This new system automatically enables cost savings when a Gemini API request to the 2.5 Pro or 2.5 Flash models shares a common prefix with a prior request.

Google has launched a new feature in its Gemini API called “implicit caching,” which the company claims can reduce costs by 75% for third-party developers using its latest AI models, Gemini 2.5 Pro and 2.5 Flash.

The feature automatically enables cost savings when a Gemini API request to a model hits a cache, eliminating the need for manual configuration required by the previous explicit caching method. According to Google, implicit caching is triggered when a request shares a common prefix with a previous request, and the minimum prompt token count required is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro.

Logan Kilpatrick, a member of the Gemini team, announced the launch on May 8, 2025, stating that the feature can deliver significant cost savings for developers. Google recommends that developers place repetitive context at the beginning of requests and append changing context at the end to increase the chances of implicit cache hits.

Caching is a widely adopted practice in the AI industry that reuses frequently accessed or pre-computed data to cut down on computing requirements and costs. Google’s previous explicit caching method required developers to define high-frequency prompts manually, which often resulted in extra work and sometimes surprisingly large API bills for some users.

Some developers had expressed dissatisfaction with the explicit caching implementation for Gemini 2.5 Pro, prompting the Gemini team to apologize and pledge to make changes. The new implicit caching feature addresses these concerns by automating the caching process and passing on cost savings to developers when a cache hit occurs.

While Google claims that implicit caching can deliver 75% cost savings, the company did not provide third-party verification of the feature’s effectiveness. As such, the actual cost savings may vary depending on how developers use the feature.

Featured image credit