Hacker Newsnew | past | comments | ask | show | jobs | submit | jgammell's commentslogin

> hybrid Mamba/Gated linear attention layers,

Do any large-scale architectures use mamba? I was under the impression that people don't use it yet due to lack of efficient implementations.

> Training is also vastly more sophisticated

Is it? In what ways?


Qwen3.5 uses Gated Delta Networks which is essentially Mamba 2 + Delta Rule. It’s quite hardware efficient.

> Is it? In what ways?

Just the reinforcement learning for reasoning, and then tool use for agents, could be its own topic.


When sampling from an LLM people normally truncate the token probability distribution so that low-probability tokens are never sampled. So the model shouldn't produce really weird outputs even if they technically have nonzero probability in the pre/post training data.


> Generative models are probabilistic: the output will be less likely to satisfy complex requirements, particularly

This is a misinformed 'critique' which always gets on my nerves, as someone who actually works with AI. The world is random. Generative models are only random in the sense that they randomly sample from the set of correct answers for a given problem (ideally). Of course LLMs make mistakes, but this has nothing to do with the fact that they are random.


My university made us learn to code 'close to the metal' and IMO this is a great way to gain an understanding of what is actually going on. Program in C, no IDE, no AI tools.

The AI tools are incredibly helpful (and people who say otherwise are disengenuous), but if you don't already roughly know how you want to implement something and you let AI take the wheel, you aren't going to learn anything. From a learning standpoint I feel like the best approach is to plan and write your code without using AI at all, then maybe use it as a critic to give feedback on what you've done.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: