[Not a specialist, just a keen armchair fan of this sort of work] > In addition ...

sdenton4 · 2025-10-22T14:26:42 1761143202

/I'd love to see new approaches that explicitly don't "admit a GPU-friendly formulation", but still move the SOTA forward. Has anyone seen anything even getting close, anywhere?/

The speedup from using a GPU over a CPU is around 100x, as a rule of thumb. And there's been an immense amount of work maximizing throughput when training on a pile of GPUs together... And a sota model will still take a long time to train. So even if you do have a non-GPU algo which is better, it'll take you a very very long time to train it - by which point the best GPU algos will have also improved substantially.

janwas · 2025-10-26T15:15:00 1761491700

Wow, that number requires STRONG caveats, lest it be called out as completely false. Take away the tensor cores (unless you only do matmuls?), and an H100 has roughly 2x as many f32 flops as a Zen5 CPU, which is considerably cheaper. I suspect brute force HW/algorithms are not going to age well: https://www.sigarch.org/dont-put-all-your-tensors-in-one-bas... (/personal opinion)