Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems like we're hitting a solid plateau of LLM performance with only slight changes each generation. The jumps between versions are getting smaller. When will the AI bubble pop?


SWE-bench pro is ~20% higher than the previous .1 generation which was released 2 months ago. For their SWE benchmark, the token consumption iso-performance is down 2x from the model they released 2 months ago.

If this is a plateau I struggle to imagine what you consider fast progress.


Your comment doesn't make any sense, opus 4.6 was release two months ago, what jump would you expect?


Every night praying for tomorrow


The generations are two months apart now though…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: