> But, history says the supercomputer of today will fit in your pocket in a few ...

SwellJoe · 2026-06-10T00:16:10 1781050570

Yeah, that's probably true, but we're also seeing that there's still tons of inefficiencies in how LLMs are being run. Seems like every couple months there's some new technique to squeeze more performance out of less hardware. KV caching improvements, fast attention, speculative decoding, dynamic quantization, quantization aware training, etc.

That said, I recently replaced my five year old self-built PC (with a top-of-the-line desktop CPU, chipset, memory, and GPU of the time) with a new everything-the-best build, and while it's clear we're not keeping up with Moore's Law anymore, it's still 4-5 times faster for compute-intensive stuff, especially parallelizable tasks. We're still getting faster/cheaper. So, the time scale is maybe ten years rather than five.

ethbr1 · 2026-06-10T12:33:27 1781094807

It's highly unlikely AI inference doesn't follow the same path as general purpose computing: variety and innovations in software lead to standardization on highest performance approaches.

As that transition happens, hardware evolves from general purpose (because nobody knows what's needed and hardware design is slow) to fixed function high performance (once requirements are better defined).

GPUs (and TPUs) are a weird middle-ground here, as they're already fairly specialized, but I wouldn't bet against next gen AI inference-optimized hardware architectures dominating that use case in ~10 years if the pace of AI arch tweaking slows.

The efficiency/power/cost gains from fixed function optimization are always too great, and the only thing that holds that approach back is rapidly mutating requirements.

pixl97 · 2026-06-10T00:37:11 1781051831

Really the biggest concerns are not computers getting spectacularly faster, but 'intelligence' algorithms getting orders of magnitude better.

Drop the power requirements 1000 fold, and yea you will be able to make your own SOTA model on the cheap. The problem is the person that has a few exaflops of power will still leave you in the dust in the intelligence explosion that would happen after an event like this.

mlyle · 2026-06-10T03:59:35 1781063975

Depends upon the intelligence vs compute scaling law— which I think no one really knows. Pretty likely to be some degree of diminishing returns, but how much? Is it logarithmic, inverse quadratic, …

If training models gets way cheaper, I would expect the diminishing returns to get steeper too.

pixl97 · 2026-06-10T14:34:33 1781102073

And you're right, no one has any clue what the limits of intelligence are. Though to me it seems odd that humanity has reached the pinnacle of it in the last million years or so after a few billion years of lifes development. Just seems improbable we are close to the limits.

mlyle · 2026-06-11T18:30:09 1781202609

I am not making an argument about limits. I just expect some degree of diminishing returns.

A related argument is speed of intelligence vs capability at that speed. You can think of a three way trade off between latency, cost, and capability that is unlikely to be linear in any dimension and that changes in steps as technology or biology evolves.

Ultimately relating to the properties of the computing substrate and almost certainly bounded by some kind of thermodynamic limits that present systems do not approach.

trhway · 2026-06-10T06:25:54 1781072754

>Pretty likely to be some degree of diminishing returns

intelligence may be different. If we look at biological brains - do we get diminishing returns or completely opposite scaling law when we compare our brain against say gorilla's ?

Vetch · 2026-06-10T10:49:52 1781088592

Interesting thought to consider in principle but fails because gorilla brains continued to evolve too, just along a different path. They're not snapshots of ancestral species locked in time.

hedora · 2026-06-10T15:42:36 1781106156

Also, it’s definitely diminishing returns, by weight, at least.

Architecture / biological structure matters more.

I’d expect weight and wattage to be proportional for animals, at least.

altcognito · 2026-06-10T01:38:25 1781055505

Single clock speed hasn't had much of an upgrade, but the architecture for doing exactly what they are doing? That will improve for at least 5-10 years. There are both huge power gains from Processing in Memory (PIM) chips (70-80% discount in energy), and improvements to engineering to make memory cheaper and cheaper.

theLiminator · 2026-06-10T05:45:56 1781070356

Yes, I'm talking about a supercomputer from today in your pocket. That probably requires at least 5000x perf/watt if not even more.

hedora · 2026-06-10T15:45:39 1781106339

That’s only two order of magnitude software optimizations, a bunch of plus delta’s, and one order of magnitude on hw.

I’d give that over 50% odds of happening in the next few years.

theLiminator · 2026-06-10T17:16:36 1781111796

I don't disbelieve a 5000x speedup is possible, I disbelieve that a modern day supercomputer will fit in your pocket in even the next 10 years.

KeplerBoy · 2026-06-10T13:09:53 1781096993

That has never been true, unfortunately. The 2005 top500 was led by bluegene/L achieving 280 FP64 TFlop/s.

Apple is talking about 17.5 FP16 TFlop/s on the iphone 17 neural engine. So 20 years later we are still nowhere near, not even at reduced precision.

hedora · 2026-06-10T16:14:24 1781108064

That’s a factor of 10-20.

You can get an SoC that does 126 TOPs (strix halo) in tablet form factor, which is a factor of two. (I’ll count them as equivalent ops, since software couldn’t low precision floating point back then). So, not quite “pocket”, but probably “purse” and certainly backpack.

CooCooCaCha · 2026-06-10T13:35:28 1781098528

Because we’ve been able to spend more and more on the next miniaturization. That does not seem infinitely sustainable or even physically possible to sustain indefinitely.

christkv · 2026-06-10T12:12:39 1781093559

In five years I think you will be able to train a frontier modem for much less money than today and the power hungry hardware of today will be cheap second hand due to the power usage.

ethbr1 · 2026-06-10T12:36:42 1781095002

There are probably better ways to communicate across a wire than having an LLM voltage-bang, but it's certainly an interesting use case.

DeathArrow · 2026-06-10T05:40:13 1781070013

>but I don't think the rate of improvement of compute/watt will match the previous decades.

Unless we invest heavily in research and find new way to do chips. But I think there's not enough motivation and money to do that.

SwellJoe · 2026-06-10T09:00:31 1781082031

There's literally never been more money being thrown at that problem.