Tag: RLHF
All the articles with the tag "RLHF".
-
The Unverifiable Reward Problem: The Real Frontier of RL for LLMs
· 11 min readDeep research on tasks with unverifiable rewards in RL — the key bottleneck for scaling RL beyond math and code. Covers JEPO, NRT, RLNVR, self-play methods, GenRM, Constitutional AI, reward hacking mitigation, and more.
-
Adding Ads in LLM/Chatbot: Character Training for Monetization
· 4 min readExploring how to integrate ads in LLMs through character training—making recommendations genuinely helpful rather than annoyingly promotional.
-
RLHF from an Engineering Perspective: PPO, GRPO, DPO, and Tool-Use Implementation
· 12 min readA practical engineering guide to RLHF implementation—covering PPO, GRPO, DPO, and tool-use training with code snippets and debugging tips.
-
Post-Training Is Not 'One Algorithm': Objective Functions and Implementation Essentials for PPO / DPO / GRPO
· 12 min readReading notes on RLHF covering PPO, DPO, and GRPO—understanding post-training as an engineering pipeline rather than a single algorithm.