When sampling from an LLM people normally truncate the token probability distribution so that low-probability tokens are never sampled. So the model shouldn't produce really weird outputs even if they technically have nonzero probability in the pre/post training data.
> Generative models are probabilistic: the output will be less likely to satisfy complex requirements, particularly
This is a misinformed 'critique' which always gets on my nerves, as someone who actually works with AI. The world is random. Generative models are only random in the sense that they randomly sample from the set of correct answers for a given problem (ideally). Of course LLMs make mistakes, but this has nothing to do with the fact that they are random.
My university made us learn to code 'close to the metal' and IMO this is a great way to gain an understanding of what is actually going on. Program in C, no IDE, no AI tools.
The AI tools are incredibly helpful (and people who say otherwise are disengenuous), but if you don't already roughly know how you want to implement something and you let AI take the wheel, you aren't going to learn anything. From a learning standpoint I feel like the best approach is to plan and write your code without using AI at all, then maybe use it as a critic to give feedback on what you've done.
Do any large-scale architectures use mamba? I was under the impression that people don't use it yet due to lack of efficient implementations.
> Training is also vastly more sophisticated
Is it? In what ways?