Unless the prior code was using O_DIRECT, the data was getting loaded into the k...

simion314 · on April 5, 2023

Thanks for the extra clarifications, but the claims were something impossible like a 23 Gb model only using 6Gb with this change. So maybe before this change it would have used a lot more of 23 Gb. I was referring to those miracle memory reductions, unfortunetly not possible, I would like to try 3 bit qunatizations when models and software will be ready(found none in my searches today)

scottlamb · on April 5, 2023

Yes, those claims were a bit much, and in fairness jart chimed in to say so too. [1]

fwiw, I'm not a ML person, but it doesn't seem entirely crazy to me to think that SSDs are becoming fast enough that you could avoid keeping a huge model in RAM in some cases. Especially if "computational SSDs" (SSDs that can do some basic first-stage computation without transferring the input data over PCIe) ever become common. (I think some of the ML accelerators for sale today might be approximately this.)

[1] https://news.ycombinator.com/item?id=35393615

dekhn · on April 5, 2023

much of performance in computing is about moving the memory hierarchy around in ways that are inconvenient to programmers.

I made an SSD into a spare swap device, and basically treated my system as having RAM+SSD's worth of RAM. It allowed me to finish a few big jobs (~96GB RAM) overnight that wouldn't have otherwise.