Not exactly sure what you are looking for here. That GRPO works? > Group Relativ...

measurablefunc · 2026-01-21T07:03:38 1768979018

None of those are novel domains w/ their own novel syntax & semantic validators, not to mention the dearth of readily available sources of examples for sampling the baselines. So again, where does it say it works for a programming language with nothing but a grammar & a compiler?

nl · 2026-01-21T12:21:53 1768998113

To quote you:

> here is no RL for programming languages.

and

> Either RL works & you have evidence

This is just so completely wrong, and here is the evidence.

I think everyone in this thread is just surprised you don't seem to know this.

Haven't you seen the hundreds of job ads for people to write code for LLMs to train on?

measurablefunc · 2026-01-21T16:48:55 1769014135

You're not going to get less confused by doubling down. None of your claims are valid & this is because you haven't actually tried to do what you're suggesting. Taking a grammar & compiler & RLing will get you nowhere.