Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah reasoning models are "self-doubt" models.

The model is trained to encourage re-evaluating the soundness of tokens produced during the "thinking phase".

The model state vector is kept in a state of open exploration. Influenced by the already emitted tokens but less strongly so.

The non-reasoning models were just trained with the goal of producing useful output on a first try and they did their best to maximize that fitness function.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: