Skip to content

RLHF and Preference Optimization

Engineering notes and research synthesis on PPO, DPO, GRPO, reward modeling, preference data, and model behavior optimization.

Focus Areas

Recommended Posts