I also don't understand why MindsAI is included. ARC is supposed to grade LLMs on their ability to generalize i.e. the higher score the more useful they are. If MindsAI scores x2 than the current SOTA then why are we wasting our $20 on inferior LLMs like ChatGPT adn Claude when we could be using the one-true-god MindsAI?
If the answer is "because it's not a general-purpose LLM" then why is ARC marketed as the ultimate benchmark, the litmus test for AGI (I know I know, passing ARC doesn't mean AGI, but the opposite is true, I know)?
ARC was never supposed to grade LLMs! I designed the ARC format back when LLMs weren't a thing at all. It's a test of AI systems' ability to generalize to novel tasks.
Hello, Francois! My question isn't related directly to the big news, but to a lecture you gave recently https://www.youtube.com/watch?v=s7_NlkBwdj8&ab_channel=Machi...
At 20:45 you say "So you cannot prepare in advance for ARC. You cannot just solve ARC by memorizing the solutions in advance."
And at 24:45 "There's a chance that you could achieve this score by purely memorizing patterns and reciting them."
Isn't that a contradiction? The way I understand it on one hand you are saying ARC can't be memorized on the other you are saying it can?