A lot of people are making the mistake of noticing that local models have been 12-24 months behind SotA ones for a good portion of the last couple years, and then drawing a dotted line assuming that continues to hold.
It simply.. doesn't. The SotA models are enormous now, and there's no free lunch on compression/quantization here.
Opus 4.6 capabilities are not coming to your (even 64-128gb) laptop or phone in the popular architecture that current LLMs use.
Now, that doesn't mean that a much narrower-scoped model with very impressive results can't be delivered. But that narrower model won't have the same breadth of knowledge, and TBD if it's possible to get the quality/outcomes seen with these models without that broad "world" knowledge.
It also doesn't preclude a new architecture or other breakthrough. I'm simply stating it doesn't happen with the current way of building these.
edit: forgot to mention the notion of ASIC-style models on a chip. I haven't been following this closely, but last I saw the power requirements are too steep for a mobile device.
Yeah, but that's the current state of the art after decades of aggressive optimizations, there's no foreseeable future where we'll ever be able to cram several orders of magnitude more ram into a phone.
We already cram several orders of magnitude more flash storage into phone than RAM (e.g. my phone has 16 GB RAM but 1 TB storage); even now, with some smart coding, if you don't need all that data at the same time for random access at sub millisecond speed, it's hard to tell the difference.
The gap between SOTA models and open / local models continues to diminish as SOTA is seeing diminishing returns on scaling (and that seems to be the main way they are "improving"), whereas local models are making real jumps. I'm actually more optimistic local models will catch up completely than I am SOTA will be taking any great leaps forward.
Would the model even need that breath of knowledge? Humans just look things up in books or on Wikipedia, which you can store on a plain old HDD, not VRAM. All books ever written fit into about 60TB if you OCR them, and the useful information in them probably in a lot less, that's well within the range of consumer technology.
Pretty sure there’s at least a couple orders of magnitude in purely algorithmic areas of LLM inference; maybe training, too, though I’m less confident here. Rationale: meat computers run on 20W, though pretraining took a billion years or so.
It simply.. doesn't. The SotA models are enormous now, and there's no free lunch on compression/quantization here.
Opus 4.6 capabilities are not coming to your (even 64-128gb) laptop or phone in the popular architecture that current LLMs use.
Now, that doesn't mean that a much narrower-scoped model with very impressive results can't be delivered. But that narrower model won't have the same breadth of knowledge, and TBD if it's possible to get the quality/outcomes seen with these models without that broad "world" knowledge.
It also doesn't preclude a new architecture or other breakthrough. I'm simply stating it doesn't happen with the current way of building these.
edit: forgot to mention the notion of ASIC-style models on a chip. I haven't been following this closely, but last I saw the power requirements are too steep for a mobile device.