Tricks are nothing but patterns in the logical formulae we reduce. Ergo these ar...

myffical · 2026-03-28T22:38:22 1774737502

Some DeepMind researchers used mechanistic interpretability techniques to find concepts in AlphaZero and teach them to human chess Grandmasters: https://www.pnas.org/doi/10.1073/pnas.2406675122

hodgehog11 · 2026-03-28T22:58:02 1774738682

This argument, that LLMs can develop new crazy strategies using RLVR on math problems (like what happened with Chess), turns out to be false without a serious paradigm shift. Essentially, the search space is far too large, and the model will need help to explore better, probably with human feedback.

https://arxiv.org/abs/2504.13837

narrator · 2026-03-28T23:44:50 1774741490

The search space for the game of Go was also thought to be too large for computers to manage.

thesz · 2026-03-29T07:18:51 1774768731

It still is [1].

[1] https://www.vice.com/en/article/a-human-amateur-beat-a-top-g...

stalfie · 2026-03-29T11:19:32 1774783172

The blind spot exploiting strategy you link to was found by an adverserial ML model...

sealeck · 2026-03-29T01:19:27 1774747167

Yes and making a horse drawn cart drive itself was thought to be impossible so why don't we have faster than light travel yet...

Finbel · 2026-03-29T06:43:48 1774766628

Yes but "the search space is too large" is something that has been said about innumerable AI-problems that were then solved. So it's not unreasonable that one doubts the merit of the statement when it's said for the umpteenth time.

hodgehog11 · 2026-03-29T07:27:19 1774769239

I should have been more specific then. The problem isn't that the search space is too large to explore. The problem is that the search space is so large that the training procedure actively prefers to restrict the search space to maximise short term rewards, regardless of hyperparameter selection. There is a tradeoff here that could be ignored in the case of chess, but not for general math problems.

This is far from unsolvable. It just means that the "apply RL like AlphaGo" attitude is laughably naive. We need at least one more trick.

vatsachak · 2026-03-30T00:12:54 1774829574

The other trick could be bootstrapping through mathlib.

As you said brute forcing the search space as the starting procedure would take way too long for the AI to build intuition.

But if we could give it a million or so lemmas of human math, that would be a great starting point.

throwaway27448 · 2026-03-29T08:20:25 1774772425

I agree that LLMs are a bad fit for mathematical reasoning, but it's very hard for me to buy that humans are a better fit than a computational approach. Search will always beat our intuition.

hodgehog11 · 2026-03-29T11:47:14 1774784834

Yes and no. I think we have vastly underestimated the extent of the search space for math problems. I also think we underestimate the degree to which our worldview influences the directions with which we attempt proofs. Problems are derived from constructions that we can relate to, often physically. Consequently, the technique in the solution often involves a construction that is similarly physical in its form. I think measure theory is a prime example of this, and it effectively unlocked solutions to a lot of long-standing statistical problems.

ineedasername · 2026-03-29T18:04:57 1774807497

That linked article says its about RLVR but then goes on to conflate other RL with it, and doesn't address much in the way of the core thinking that was in the paper they were partially responding to that had been published a month earlier[0] which laid out findings and theory reasonably well, including work that runs counter to the main criticism in the article you cited, ie, performance at or above base models only being observed with low K examples.

That said, reachability and novel strategies are somewhat overlapping areas of consideration, and I don't see many ways in which RL in general, as mainly practiced, improves upon models' reachability. And even when it isn't clipping weights it's just too much of a black box approach.

But none of this takes away from the question of raw model capability on novel strategies, only such with respect to RL.

[0] https://arxiv.org/pdf/2506.14245

slopinthebag · 2026-03-28T21:33:02 1774733582

Stockfish's power comes from mostly search, and the ML techniques it uses are mainly about better search, i.e. pruning branches more efficiently.

vatsachak · 2026-03-28T21:37:16 1774733836

The weights must still have some understanding of the chess board. Though there is always the chance that it makes no sense to us

emp17344 · 2026-03-28T21:53:22 1774734802

Why must it involve understanding? I feel like you’re operating under the assumption that functionalism is the “correct” philosophical framework without considering alternative views.

slopinthebag · 2026-03-28T21:41:18 1774734078

Even that is probably too much. It has no understanding of what "chess" is, or what a chess board is, or even what a game is. And yet it crushes every human with ease. It's pretty nuts haha.

anematode · 2026-03-28T21:51:35 1774734695

Actually, the neural net itself is fairly imprecise. Search is required for it to achieve good play. Here's an example of me beating Stockfish 18 at depth 1: https://lichess.org/XmITiqmi

Sopel · 2026-03-28T21:54:20 1774734860

chess is just a simple mathematical construct so that's not surprising

PowerElectronix · 2026-03-29T10:29:56 1774780196

There is no understanding, the weights are selected based on better fit. Our cells have no understanding of optics just because they have the eyes coded in their DNA.

hollerith · 2026-03-28T21:44:14 1774734254

Does Stockfish have weights or use a neural net? I know older versions did not.

Sopel · 2026-03-28T21:50:35 1774734635

Sopel · 2026-03-28T21:51:06 1774734666

The ML techniques it uses are only about evaluation, but you were close