Liquid AI has introduced LFM2-2.6B-Exp, an experimental checkpoint of its LFM2-2.6B language model that is trained with pure reinforcement learning […]
Category: Reinforcement Learning
How We Learn Step-Level Rewards from Preferences to Solve Sparse-Reward Environments Using Online Process Reward Learning
In this tutorial, we explore Online Process Reward Learning (OPRL) and demonstrate how we can learn dense, step-level reward signals […]
How to Build, Train, and Compare Multiple Reinforcement Learning Agents in a Custom Trading Environment Using Stable-Baselines3
In this tutorial, we explore advanced applications of Stable-Baselines3 in reinforcement learning. We design a fully functional, custom trading environment, […]
Sigmoidal Scaling Curves Make Reinforcement Learning RL Post-Training Predictable for LLMs
Reinforcement Learning RL post-training is now a major lever for reasoning-centric LLMs, but unlike pre-training, it hasn’t had predictive scaling […]
NVIDIA Researchers Propose Reinforcement Learning Pretraining (RLP): Reinforcement as a Pretraining Objective for Building Reasoning During Pretraining
NVIDIA AI has introduced Reinforcement Learning Pretraining (RLP), a training objective that injects reinforcement learning into the pretraining stage rather […]
MoonshotAI Released Checkpoint-Engine: A Simple Middleware to Update Model Weights in LLM Inference Engines, Effective for Reinforcement Learning
MoonshotAI has open-sourced checkpoint-engine, a lightweight middleware aimed at solving one of the key bottlenecks in large language model (LLM) […]
- AI
- AI alignment
- AI behavior
- AI deception
- AI ethics
- AI research
- AI safety
- ai safety testing
- AI security
- Alignment research
- Andrew Deck
- Anthropic
- Biz & IT
- Claude Opus 4
- Generative AI
- goal misgeneralization
- Jeffrey Ladish
- large language models
- Machine Learning
- o3 model
- openai
- Palisade Research
- Reinforcement Learning
- Technology
Is AI really trying to escape human control and blackmail people?
Mankind behind the curtain Opinion: Theatrical testing scenarios explain why AI models produce alarming outputs—and why we fall for it. […]
NVIDIA AI Releases ProRLv2: Advancing Reasoning in Language Models with Extended Reinforcement Learning RL
Table of contents What Is ProRLv2? Key Innovations in ProRLv2 How ProRLv2 Expands LLM Reasoning Why It Matters Using Nemotron-Research-Reasoning-Qwen-1.5B-v2 […]
How a big shift in training LLMs led to a capability explosion
Skip to content Reinforcement learning, explained with a minimum of math and jargon. Credit: Aurich Lawson | Getty Images Credit: […]
Meta Researchers Introduced J1: A Reinforcement Learning Framework That Trains Language Models to Judge With Reasoned Consistency and Minimal Data
Large language models are now being used for evaluation and judgment tasks, extending beyond their traditional role of text generation. […]
