Natural language processing

Google’s new caching feature for Gemini 2.5 aims to reduce costs by up to 75 percent

Google’s new caching feature for Gemini 2.5 aims to reduce costs by up to 75 percent


Google introduces “implicit caching” in Gemini 2.5, aiming to cut developer costs by as much as 75 percent. The new feature automatically detects and stores recurring content, so repeated prompts are only processed once. According to Google, this can lead to significant savings compared to the old explicit caching method, where users had to set up their own cache. To maximize the benefits, Google recommends putting the stable part of a prompt—like system instructions—at the start, and adding user-specific input, such as questions, afterwards. Implicit caching kicks in for Gemini 2.5 Flash starting at 1,024 tokens, and for Pro versions from 2,048 tokens onwards. More details and best practices are available in the Gemini API documentation.

Google's new caching feature for Gemini 2.5 aims to reduce costs by up to 75 percent

Source link