the hardware these LLMs run on isn't going to get 10x faster/cheaper in the span...

the hardware these LLMs run on isn't going to get 10x faster/cheaper in the span of a couple years, it will get incrementally faster at the cost of having to buy new expensive datacenter GPU hardware. It's not going to magically save them from losing money on serving requests.