Hacker Newsnew | past | comments | ask | show | jobs | submit | aluminum96's commentslogin

“they must be lying because I personally dislike them”

This is why HN threads about AI have become exhausting to read


In general I agree with you, but I see the point of requiring proof for statements made by them, instead of accepting them at face value. In those cases, given previous experiences and considering that they benefit from making them, if they are believed, the burden of proof should be on those making these statements, not on those questioning them, no?

Those models seem to be special and not part of their normal product line, as is pointed out in the comments here. I would assume that in that case they indeed had the purpose of passing these tests in mind when creating them. Or was it created for something different, and completely by chance they discovered they could be used for the challenge, unintentionally?


Yeah, that's how the concept of "reputation" works.


No, they are likely lying, because they have huge incentives to lie


OpenAI explicitly stated that it is natural language only, with no tools such as Lean.

https://x.com/alexwei_/status/1946477745627934979?s=46&t=Hov...


Why do people keep making up controversial claims like this? There is no evidence at all to this effect


it was widely covered in the press earlier in the year


Source?


Mark Chen posted that the system was locked before the contest. [1] It would obviously be crazy cheating to give verifiers a solution to the problem!

[1] https://x.com/markchen90/status/1946573740986257614?s=46&t=H...


The proofs were published on GitHub for inspection, along with some details (generated within the time limit, by a system locked before the problems were released, with no external tools).

https://github.com/aw31/openai-imo-2025-proofs/tree/main


The solutions were publicly posted to GitHub: https://github.com/aw31/openai-imo-2025-proofs/tree/main


Did humans formalize the inputs ? or was the exact natural language input provided to the llm. A lot of detail is missing on the methodology used. Not to mention of any independent validation.

My skepticism stems from the past frontier math announcement which turned out to be a bluff.


People are reading a lot into the FrontierMath articles from a couple months ago, but tbh I don’t really understand what the controversy is supposed to be there. failing to clearly disclose sponsoring Epoch to make the benchmark clearly doesn’t affect performance of a model on it


What, you mean your fruit preferences don't form a total order?


Of course they do, but in this example there's no way to compare cherries to bananas.

Grapefruit is of course the best fruit.


Just vaporized a whole team so the roles can be moved overseas :(


full-timers or contacting?


SF Voters rejected Proposition A in 2022 [1], which would have included funding to upgrade Muni's control systems (among many other projects). We'll eventually have to find the money somewhere else when the system fails.

[1] https://www.sfchronicle.com/sf/article/S-F-voters-narrowly-r...


Google needs much stronger SVP-level product leadership. Directors and VPs should not be fighting for product turf, and major user-facing products, such as an entire default Android app in this case, need to outlast the tenure of any individual VP-level patron.


I get that the upper management is playboy billionaires at this point and have checked out, but at this point it would be B_A_S_I_C management to look at this churn and not thing "why aren't we conserving the codebases" because really this is just branding and reskinning fundamentally.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: