Yeah, right...
> Conclusion: Always set billing caps and alerts on cloud API keys.
Sadly, way easier said than done in the case of GCP. Been a proper reason for me to avoid GCP deployments with LLM use-cases for smaller projects.
I remember looking into this a while back assuming it would be a sane feature to expect. But for some reason it's surprisingly non-trivial with GCP to set budgets. Especially if the only thing you want is a Gemini API key with finite spending.
IIRC you could either set (rate) limits on quotas, but quotas are extremely granular (like, per region per model) meaning you need to both set tons of values and understand which quotas to relax. Or alternatively do some bubblegum-and-ducktape like solution where you build an event-driven pipeline to react to cost increases in your own project.
I understand that exact budgets are hard to enforce in real-time, especially for their more complex infra offerings.
However, (1) even if it's not exactly real-time, but instead enforced every hour that's already going to go a long way, and (2) PAYG LLM usage is billed rather linearly by the amount of tokens you use, so if there would be an easy way to set a dollar-amount and have that expressed as budgets that would already get you part of the way there.
Anyway, the current state of GCP budgeting it makes me avoid it for production usage until I'm ready to commit spending significant effort to harden it. For all small projects, the free tier tokens are a safe bet, but their extremely low rate-limits make them rarely a good fit.