Those products show OpenAI was innovating and leading in RL at that stage around 2017 to 2019.
https://github.com/openai/gym
https://en.wikipedia.org/wiki/OpenAI_Five
DeepSeek's GRPO is also just a minor variant of PPO.
Those products show OpenAI was innovating and leading in RL at that stage around 2017 to 2019.
https://github.com/openai/gym
https://en.wikipedia.org/wiki/OpenAI_Five