Thanks for the extra clarifications, but the claims were something impossible like a 23 Gb model only using 6Gb with this change. So maybe before this change it would have used a lot more of 23 Gb. I was referring to those miracle memory reductions, unfortunetly not possible, I would like to try 3 bit qunatizations when models and software will be ready(found none in my searches today)
Yes, those claims were a bit much, and in fairness jart chimed in to say so too. [1]
fwiw, I'm not a ML person, but it doesn't seem entirely crazy to me to think that SSDs are becoming fast enough that you could avoid keeping a huge model in RAM in some cases. Especially if "computational SSDs" (SSDs that can do some basic first-stage computation without transferring the input data over PCIe) ever become common. (I think some of the ML accelerators for sale today might be approximately this.)
much of performance in computing is about moving the memory hierarchy around in ways that are inconvenient to programmers.
I made an SSD into a spare swap device, and basically treated my system as having RAM+SSD's worth of RAM. It allowed me to finish a few big jobs (~96GB RAM) overnight that wouldn't have otherwise.