More

sothatsit · 2026-01-17T09:10:13 1768641013

The idea is that neofeudal lords would own fully automated systems to produce everything they need. They wouldn't need to make stuff to sell to us at all, they could just pursue their own goals. We would be irrelevant to them, other than a nuisance if we tried to get some of their resources for ourselves.

Imagine if you owned a million humanoid robots and a data center with your own super-intelligence. Would you produce doodads to sell to people? Or would you build your own rockets to mine asteroids, fortresses and weapons systems to protect yourself, and palaces for you to live in?

I don't agree that this is where we are headed, but that is the idea. Thinking about this in relation to our current economy is missing the point.

imiric · 2026-01-17T09:19:17 1768641557

Aha, so late-game Factorio. It's a nice fantasy, but I don't think that the rest of humanity will stand by and allow the entire system to function autonomously. It's more likely that heads will roll far before such system is in place.

sothatsit · 2026-01-15T22:07:51 1768514871

If you use coding agents as a black box, then yes you might learn less. But if you use them to experiment more, your intuition will get more contact with reality, and that will help you learn more.

For example, my brother recently was deciding how to structure some auth code. He told me he used coding agents to just try several ideas and then he could pick a winner and nail down that one. It's hard to think of a better way to learn the consequences of different design decisions.

Another example is that I've been using coding agents to write CUDA experiments to try to find ways to optimise our codegen. I need an understanding of GPU performance to do this well. Coding agents have let me run 5x the number of experiments I would be able to code, run, and analyse on my own. This helps me test my intuition, see where my understanding is wrong, and correct it.

In this whole process I will likely memorise fewer CUDA APIs and commands, that's true. But I'm happy with that tradeoff if it means I can learn more about bank conflicts, tradeoffs between L1 cache hit rates and shared memory, how to effectively use the TMA, warp specialisation, block swizzling to maximise L2 cache hit rates, how to reduce register usage without local spilling, how to profile kernels and read the PTX/SASS code, etc. I've never been able to put so much effort into actually testing things as I am learning them.

sothatsit · 2026-01-15T21:03:08 1768510988

FWIW, I've heard many people say that with voice dictation they ramble to LLMs and by speaking more words can convey their meaning well, even if their writing quality is low. I don't do this regularly, but when I have tried it, it seemed to work just as well as my purposefully-written prompts. I can imagine a non-technical person rambling enough that the AI gets what they mean.

headcanon · 2026-01-15T21:17:19 1768511839

Thats a fair counterpoint, and it has helped translate my random thoughts into more coherent text. I also haven't taken advantage of dictation much at all either, so maybe I'll give it a try. I still think the baseline skill that writing gives you translates to an LLM-use skill, which is thinking clearly and knowing how to structure your thoughts. Maybe folks can get that skill in other ways (oration, art, etc.). I don't need to give it essays, but I do need to give it clear instructions. Every time it spins off and does something I don't want, its because I didn't clarify my thoughts correctly.

arcanemachiner · 2026-01-16T00:49:54 1768524594

Setting up SpeechNote with Kokoro is one of the the best things I've ever done.

I can speak faster than I type, and the flow state is much smoother when you can just dump a stream of consciousness into the context window in a matter of seconds. And the quality of the model is insane for something that runs locally, on reasonable hardware no less.

Swearing at an LLM is also much more fun when done verbally.

dworks · 2026-01-15T22:13:32 1768515212

The prompt the user enters is actually not the prompt. Most agents will have an additional background step to use the user's prompt to generate the actual, detailed instructions, which is then used as the actual prompt for code generation. That's how the ability to build a website from "create a website that looks like twitter" is achieved.

sothatsit · 2026-01-15T03:43:01 1768448581

The flip side is that these companies seem to be capacity constrained (although that is hard to confirm). If you assume the labs are capacity constrained, which seems plausible, then building more capacity could pay off by allowing labs to serve more customers and increase revenue per customer.

This means the bigger questions are whether you believe the labs are compute constrained, and whether you believe more capacity would allow them to drive actual revenue. I think there is a decent chance of this being true, and under this reality the investments make more sense. I can especially believe this as we see higher-cost products like Claude Code grow rapidly with much higher token usage per user.

This all hinges on demand materialising when capacity increases, and margins being good enough on that demand to get a good ROI. But that seems like an easier bet for investors to grapple with than trying to compare future investment in capacity with today's revenue, which doesn't capture the whole picture.

Forgeties79 · 2026-01-15T03:53:57 1768449237

I am not someone who would ever be ever be considered an expert on factories/manufacturing of any kind, but my (insanely basic) understanding is that typically a “factory” making whatever widgets or doodads is outputting at a profit or has a clear path to profitability in order to pay off a loan/investment. They have debt, but they’re moving towards the black in a concrete, relatively predictable way - no one speculates on a factory anywhere near the degree they do with AI companies currently. If said factory’s output is maxed and they’re still not making money, then it’s a losing investment and they wouldn’t expand.

Basically, it strikes me as not really apples to apples.

sothatsit · 2026-01-15T04:09:40 1768450180

Consensus seems to be that the labs are profitable on inference. They are only losing money on training and free users.

The competition requiring them to spend that money on training and free users does complicate things. But when you just look at it from an inference perspective, looking at these data centres like token factories makes sense. I would definitely pay more to get faster inference of Opus 4.5, for example.

This is also not wholly dissimilar to other industries where companies spend heavily on R&D while running profitable manufacturing. Pharma semiconductors, and hardware companies like Samsung or Apple all do this. The unusual part with AI labs is the ratio and the uncertainty, but that's a difference of degree, not kind.

Eisenstein · 2026-01-15T07:44:55 1768463095

> But when you just look at it from an inference perspective, looking at these data centres like token factories makes sense.

So if you ignore the majority of the costs, then it makes sense.

Opus 4.5 was released on November 25, 2025. That is less than 2 months ago. When they stop training new models, then we can forget about training costs.

fc417fc802 · 2026-01-15T10:14:12 1768472052

I'm not taking a side here - I don't know enough - but it's an interesting line of reasoning.

So I'll ask, how is that any different than fabs? From what I understand R&D is absurd and upgrading to a new node is even more absurd. The resulting chips sell for chump change on a per unit basis (analogous to tokens). But somehow it all works out.

Well, sort of. The bleeding edge companies kept dropping out until you could count them on one hand at this point.

At first glance it seems like the analogy might fit?

short_sells_poo · 2026-01-15T10:57:07 1768474627

Someone else mentioned it elsewhere in this thread, and I believe this is the crux of the issue: this is all predicated in the actual end users finding enough benefit in LLM services to keep the gravy train going. It's irrelevant how scalable and profitable the shovel makes are, to keep this business afloat long term, the shovelers - ie the end users - have to make money using the shovesl. Those expectations are currently ridiculously inflated. Far beyond anything in the past.

Invariably, there's going to be a collapse in the hype, the bubble will burst, and an investment deleveraging will remove a lot of money from the space in a short period of time. The bigger the bubble, the more painful and less survivable this event will be.

sothatsit · 2026-01-15T08:27:40 1768465660

Inference costs scale linearly with usage. R&D expenses do not.

That's not to mention that Dario Amodei has said that their models actually have a good return, even when accounting for training costs [0].

[0] https://youtu.be/GcqQ1ebBqkc?si=Vs2R4taIhj3uwIyj&t=1088

Eisenstein · 2026-01-15T10:08:55 1768471735

> Inference costs scale linearly with usage. R&D expenses do not.

Do we know this is true for AI?

brookst · 2026-01-15T12:29:58 1768480198

It’s pretty much the definition of fixed costs versus variable costs.

You spend the same amount on R&D whether you have one hobbyist user or 90% market share.

sothatsit · 2026-01-15T10:20:02 1768472402

Yes. R&D is guaranteed to fall as a percentage of costs eventually. The only question is when, and there is also a question of who is still solvent when that time comes. It is competition and an innovation race that keeps it so high, and it won't stay so high forever. Either rising revenues or falling competition will bring R&D costs down as a percentage of revenue at some point.

saagarjha · 2026-01-15T11:25:51 1768476351

Yes, but eventually may be longer than the market can hold out. So far R&D expenses have skyrocketed and it does not look like that will be changing anytime soon.

sothatsit · 2026-01-15T20:12:24 1768507944

That's why it is a bet, and not a sure thing.

Forgeties79 · 2026-01-15T19:50:57 1768506657

>Consensus seems to be that the labs are profitable on inference. They are only losing money on training and free users.

That sounds like “we’re profitable if you ignore our biggest expenses.” If they could be profitable now, we’d see at least a few companies just be profitable and stop the heavy expenses. My guess is it’s simply not the case or everyone’s trapped in a cycle where they are all required to keep spending too much to keep up and nobody wants to be the first to stop. Either way the outcome is the same.

sothatsit · 2026-01-15T20:11:37 1768507897

This is just not true. Plenty of companies will remain unprofitable for as long as they can in the name of growth, market share, and beating their competition. At some point it will level out, but while they can still raise cheap capital and spend it to grow, they will.

OpenAI could put in ads tomorrow and make tons of money overnight. The only reason they don't is competition. But when they start to find it harder to raise capital to fund their growth, they will.

Forgeties79 · 2026-01-17T20:17:25 1768681045

I understand how this has worked historically but when have we seen this amount of money invested so rapidly into a new area? Crypto, social media, none of it comes close. I just don’t think those rules apply anymore. As I mentioned in a previous comment this is literally altering the economies of cities and states in the US, all driven by tech company speculation. This could be my own ignorance, but it seems to me that we have never seen anything like this, and I really can’t find a single sector that has ever seen this kind of investment before. I guess maybe railroads across the US in the 19th century? I’d have to actually look at what those numbers looked like and it’s pretty hard to call that comparing apple to apples.

sothatsit · 2026-01-10T02:42:26 1768012946

This often feels like an annoying question to ask, but what models were you using?

The difference between free ChatGPT, GPT-5.2 Thinking, and GPT-5.2 Pro is enormous for areas like logic and math. Often the answer to bad results is just to use a better model.

Additionally, sometimes when I get bad results I just ask the question again with a slightly rephrased prompt. Often this is enough to nudge the models in the right direction (and perhaps get a luckier response in the process). However, if you are just looking at a link to a chat transcript, this may not be clear.

abdullahkhalids · 2026-01-10T05:11:31 1768021891

I have openrouter account, so I try different models easily. I have tried Sonnet, Opus, various versions of GPT, Deepseek. There are certainly differences in the quality. I also do rephrase prompts all the time. But ultimately, I can't quite get them to work in quantum computing. Far easier to get them to answer coding or writing related questions.

sothatsit · 2026-01-11T09:35:59 1768124159

Both Erdos #728 and #729 were solved with the use of GPT-5.2 Pro. Lesser models have much worse performance on difficult problems like these.

sothatsit · 2026-01-09T22:30:56 1767997856

Human flourishing

sothatsit · 2026-01-08T17:17:22 1767892642

And it will get even more profitable for free users when ads roll in.

sothatsit · 2026-01-08T16:56:23 1767891383

Inference costs falling 2x doesn’t decrease hardware prices when demand for tokens has increased 10x.

PaulHoule · 2026-01-08T17:34:32 1767893672

It's the ratio. If revenue goes up 10x you can afford 10x more hardware if you can afford to do it all.

sothatsit · 2026-01-07T08:04:36 1767773076

My thinking is definitely better. I spend more time worrying about the specific architecture, memory layout, GPU features, etc. to come up with ideas for optimisations, and I think less about specific implementation details. I’ve gotten a better mental model of our code faster because of this. I have also found substantial speed ups by thinking about the problem at a higher level, while iterating on implementation details quickly using Opus.

sothatsit · 2026-01-07T00:34:27 1767746067

I like these examples that predictably show the weaknesses of current models.

This reminds me of that example where someone asked an agent to improve a codebase in a loop overnight and they woke up to 100,000 lines of garbage [0]. Similarly you see people doing side-by-side of their implementation and what an AI did, which can also quite effectively show how AI can make quite poor architecture decisions.

This is why I think the “plan modes” and spec driven development are so important effective for agents, because it helps to avoid one of their main weaknesses.

[0] https://gricha.dev/blog/the-highest-quality-codebase

pugworthy · 2026-01-07T05:18:05 1767763085

To me, this doesn't show the weakness of current models, it shows the variability of prompts and the influence on responses. Because without the prompt it's hard to tell what influenced the outcome.

I had this long discussion today with a co-worker about the merits of detailed queries with lots of guidance .md documents, vs just asking fairly open ended questions. Spelling out in great detail what you want, vs just generally describing what you want the outcomes to be in general then working from there.

His approach was to write a lot of agent files spelling out all kinds of things like code formatting style, well defined personas, etc. And here's me asking vague questions like, "I'm thinking of splitting off parts of this code base into a separate service, what do you think in general? Are there parts that might benefit from this?"

sothatsit · 2026-01-07T08:09:25 1767773365

It is definitely a weakness of current models. The fact that people find ways around those weaknesses does not mean the weaknesses do not exist.

Your approach is also very similar to spec driven development. Your spec is just a conversation instead of a planning document. Both approaches get ideas from your brain into the context window.

OccamsMirror · 2026-01-07T06:59:46 1767769186

So which approach worked better?

pugworthy · 2026-01-07T18:29:31 1767810571

Challenging to answer, because we're at different levels of programming. I'm Senior / Architect type with many years of experience programming, and he's an ME using code to help him with data processing and analysis.

I have a hunch if you asked which approach we took based on background, you'd think I was the one using the detailed prompt approach and him the vague.