My point of view: this is a real advancement. I’ve always believed that with the right data allowing the LLM to be trained to imitate reasoning, it’s possible to improve its performance. However, this is still pattern matching, and I suspect that this approach may not be very effective for creating true generalization. As a result, once o1 becomes generally available, we will likely notice the persistent hallucinations and faulty reasoning, especially when the problem is sufficiently new or complex, beyond the “reasoning programs” or “reasoning patterns” the model learned during the reinforcement learning phase.
https://www.lycee.ai/blog/openai-o1-release-agi-reasoning
> As a result, once o1 becomes generally available, we will likely notice the persistent hallucinations and faulty reasoning, especially when the problem is sufficiently new or complex, beyond the “reasoning programs” or “reasoning patterns” the model learned during the reinforcement learning phase.
I had been using 4o as a rubber ducky for some projects recently. Since I appeared to have access to o1-preview, I decided to go back and redo some of those conversations with o1-preview.
I think your comment is spot on. It's definitely an advancement, but still makes some pretty clear mistakes and does some fairly faulty reasoning. It especially seems to have a hard time with causal ordering, and reasoning about dependencies in a distributed system. Frequently it gets the relationships backwards, leading to hilarious code examples.