Proximal Policy Optimization (PPO)
Term
Proximal Policy Optimization is a reinforcement-learning algorithm introduced by OpenAI that updates policies through clipped surrogate objectives for improved stability. It is widely used as a baseline method for training agents, including large-language-model agents discussed in recent research on agentic RL.
article
1 story
calendar_today
First: 2026-03-06
update
Last: 2026-03-06
menu_book
Wikipedia