Why your LLM invoice is exploding — and the way semantic caching can lower it by 73%

the-globe

2 months ago

Source link : https://tech365.info/why-your-llm-invoice-is-exploding-and-the-way-semantic-caching-can-lower-it-by-73/

Our LLM API invoice was rising 30% month-over-month. Visitors was rising, however not that quick. Once I analyzed our question logs, I discovered the actual drawback: Customers ask the identical questions in several methods.

“What’s your return policy?,” “How do I return something?”, and “Can I get a refund?” have been all hitting our LLM individually, producing practically equivalent responses, every incurring full API prices.

Actual-match caching, the apparent first answer, captured solely 18% of those redundant calls. The identical semantic query, phrased in a different way, bypassed the cache completely.

So, I applied semantic caching based mostly on what queries imply, not how they’re worded. After implementing it, our cache hit fee elevated to 67%, decreasing LLM API prices by 73%. However getting there requires fixing issues that naive implementations miss.

Why exact-match caching falls brief

Conventional caching makes use of question textual content because the cache key. This works when queries are equivalent:

# Actual-match caching

cache_key = hash(query_text)

if cache_key in cache:

return cache[cache_key]

However customers don’t phrase questions identically. My evaluation of 100,000 manufacturing queries discovered:

Solely 18% have been precise duplicates of earlier queries

47% have been semantically much like earlier queries (identical intent, totally different wording)

35% have been genuinely novel queries

That 47% represented huge price financial…

—-

Author : tech365

Publish date : 2026-01-10 22:33:00

Copyright for syndicated content belongs to the linked Source.

—-

1 – 2 – 3 – 4 – 5 – 6 – 7 – 8