You sure? Say your app is reaching the cap. What do you do with ongoing costs? Do you block writes? Shut down VMs? Delete stored data?
So you might say, simply project the cost (with some magic) and prevent that from going over the limit. So, imagine, your app suddenly experiences a load peak and you need to scale up. However, adding a VM would increase that projection too much. Do you not scale, despite this possibily being a small load peak, and let the app go down? Or do you risk the situation described above? Doesn't matter, you'll get bad press either way.
And beyond that, you'd still need to project costs like traffic volume, which can vary extremely. Not to say anything about the technical difficulties of coordinating that billing information across hundreds of services in real time.
And even if you do all that, you still get bad press of the likes of "we forgot to remove our payment limit and it killed our app while being on the front page (and our alert did not trigger because we couldn't afford another mail)".
There's no way AWS (or any other cloud) is eating all these drawbacks just to have a limit. I bet it's orders of magnitude cheaper to just eat the occasional surprise bill.
Does it shut down at $150.00 or does it shut down “some amount of time and unknown dollars after you cross $150”?
I’m willing to bet it’s the latter. If in a normal account, you’d then incurred $152.78 or $166.39, did the limit work? Would customers agree?
My cloud bills continue to change for several days past the end of the month (for legitimate calculations that come in for usage incurred during the month).
We couldn't technically make it stop exactly at $150.00, but only at $167.89 or whatever, so we are letting it run to $15k.
For catastrophic cases it doesn't matter. If it saves a person from an unexpected $15k bill then it works. Even for many businesses it would be ok to drop everything - I know some which can withstand being offline for a day, but not a $250k bill.
Make the MVP opt-in, delay any irreversible stuff by a few months (ex. deleting s3) with a deposit to cover costs, figure out the rest from user feedback? Aka do it like any other new feature is developed in a modern shop
> You sure? Say your app is reaching the cap. What do you do with ongoing costs? Do you block writes? Shut down VMs? Delete stored data?
Two approaches:
A) Hard limits: freeze the services immediately if your cap is reached, ideally by giving a heads up some time beforehand with predictions, if possible; this is what many VPS providers out there do for unpaid bills and such, which makes sense
B) Courtesy: allow the services to keep working, but at a degraded performance level - that's what some of the other VPS providers out there do; for example, decrease disk performance, cap the CPU performance, limit the network speeds etc.; probably eventually also block writes, but don't delete data outright; any of the aforementioned should trigger monitoring alerts on the developers' side and Zabbix or another solution would alert them in minutes, as well as the vendor should also send e-mails about these measures either currently being put into place, or about to be put into place, so that the necessary actions can be taken
> So you might say, simply project the cost (with some magic) and prevent that from going over the limit. So, imagine, your app suddenly experiences a load peak and you need to scale up. However, adding a VM would increase that projection too much. Do you not scale, despite this possibility being a small load peak, and let the app go down? Or do you risk the situation described above?
There's a difference between having the current capacity with a degraded performance during the spike and killing the entire app. You don't always need to scale up, depending on your failure modes. Having consistent service response times is overrated, as is needing to serve every single request without ever telling a small portion of your users that your service is experiencing high load - there should be solutions in place to deal with the backpressure and prevent data loss even under these circumstances anyways.
Unless you work in a Governmental organization or another critical piece of software for society, degraded performance is probably okay and no one feasibly cares or remembers even small outages - regardless of whether it's large sites, or small non profits or even side projects. Whereas if you do, then you probably have enough money to throw around for billing caps to not be relevant.
If you subscribe to those beliefs about always needing to be up and serve requests, however, then there's another option:
C) Billing alerts: something that most of the providers out there already provide in some capacity, however in fairly bad ways; if AWS can bill you for Lambda functions on a 1ms basis, then there's no excuse for not receiving billing alerts the very instant when this spike first happens: https://aws.amazon.com/about-aws/whats-new/2020/12/aws-lambda-changes-duration-billing-granularity-from-100ms-to-1ms/
Better yet, allow your clients to choose which of those mechanisms they desire to use, in the order of the potentially least expensive (infrastructure wise) to the most: A, B or C. That way the little guys for whom a 10k bill would be life ruining could just use A, whereas startups could stick with B and huge corporations who have a large runway of cash to burn could use C.
> Doesn't matter, you'll get bad press either way.
Bad press? As opposed to what, going broke and not being able to pay your rent because of unpredictably large bills with no way to limit them, just because your side project got popular on Reddit or Hacker News?
There's a world of difference between what's needed by corporations and what's feasible for private individuals, so for as long as there's a chance of such bills, i will not use Azure, AWS, GCP or any other platform like that.
Remember: these surprise bills will only be "eaten" by the larger providers based on their own goodwill. There's not much preventing them from banning you outright.
So you might say, simply project the cost (with some magic) and prevent that from going over the limit. So, imagine, your app suddenly experiences a load peak and you need to scale up. However, adding a VM would increase that projection too much. Do you not scale, despite this possibily being a small load peak, and let the app go down? Or do you risk the situation described above? Doesn't matter, you'll get bad press either way.
And beyond that, you'd still need to project costs like traffic volume, which can vary extremely. Not to say anything about the technical difficulties of coordinating that billing information across hundreds of services in real time.
And even if you do all that, you still get bad press of the likes of "we forgot to remove our payment limit and it killed our app while being on the front page (and our alert did not trigger because we couldn't afford another mail)".
There's no way AWS (or any other cloud) is eating all these drawbacks just to have a limit. I bet it's orders of magnitude cheaper to just eat the occasional surprise bill.