Yeah reasoning models are "self-doubt" models. The model is trained to encourage...

Yeah reasoning models are "self-doubt" models.

The model is trained to encourage re-evaluating the soundness of tokens produced during the "thinking phase".

The model state vector is kept in a state of open exploration. Influenced by the already emitted tokens but less strongly so.

The non-reasoning models were just trained with the goal of producing useful output on a first try and they did their best to maximize that fitness function.