Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

Implicit caching aims to slash Gemini API costs by 75%

This new system automatically enables cost savings when a Gemini API request to the 2.5 Pro or 2.5 Flash models shares a common prefix with a prior request.

byKerem Gülen
May 9, 2025
in Artificial Intelligence, News

Google has launched a new feature in its Gemini API called “implicit caching,” which the company claims can reduce costs by 75% for third-party developers using its latest AI models, Gemini 2.5 Pro and 2.5 Flash.

The feature automatically enables cost savings when a Gemini API request to a model hits a cache, eliminating the need for manual configuration required by the previous explicit caching method. According to Google, implicit caching is triggered when a request shares a common prefix with a previous request, and the minimum prompt token count required is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro.

Logan Kilpatrick, a member of the Gemini team, announced the launch on May 8, 2025, stating that the feature can deliver significant cost savings for developers. Google recommends that developers place repetitive context at the beginning of requests and append changing context at the end to increase the chances of implicit cache hits.

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

Caching is a widely adopted practice in the AI industry that reuses frequently accessed or pre-computed data to cut down on computing requirements and costs. Google’s previous explicit caching method required developers to define high-frequency prompts manually, which often resulted in extra work and sometimes surprisingly large API bills for some users.

Some developers had expressed dissatisfaction with the explicit caching implementation for Gemini 2.5 Pro, prompting the Gemini team to apologize and pledge to make changes. The new implicit caching feature addresses these concerns by automating the caching process and passing on cost savings to developers when a cache hit occurs.

While Google claims that implicit caching can deliver 75% cost savings, the company did not provide third-party verification of the feature’s effectiveness. As such, the actual cost savings may vary depending on how developers use the feature.


Featured image credit

Tags: APIgeminiGoogle

Related Posts

AWS outage: A complete list of every site and app that went down

AWS outage: A complete list of every site and app that went down

October 20, 2025
Facebook’s new AI tool will scan your camera roll

Facebook’s new AI tool will scan your camera roll

October 20, 2025
Google will discontinue this once fancy project and here’s why

Google will discontinue this once fancy project and here’s why

October 20, 2025
Wikipedia’s human traffic drops 8% as AI takes the wheel

Wikipedia’s human traffic drops 8% as AI takes the wheel

October 20, 2025
WhatsApp tests monthly message cap to fight spam

WhatsApp tests monthly message cap to fight spam

October 20, 2025
How one woman beat eviction court using ChatGPT and Perplexity

How one woman beat eviction court using ChatGPT and Perplexity

October 20, 2025

LATEST NEWS

AWS outage: A complete list of every site and app that went down

Facebook’s new AI tool will scan your camera roll

Google will discontinue this once fancy project and here’s why

Wikipedia’s human traffic drops 8% as AI takes the wheel

WhatsApp tests monthly message cap to fight spam

How one woman beat eviction court using ChatGPT and Perplexity

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Glossary
    • Whitepapers
  • Newsletter
  • + More
    • Conversations
    • Events
    • About
      • About
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy Policy.