Tag: RLHF

All the articles with the tag "RLHF".

The Mercor Breach: What 4TB of Stolen Data Reveals About How Frontier AI Labs Actually Train Models

12 Apr, 2026 · 22 min read

A $10B AI data vendor was breached, exposing 84 Airtable workspaces of training data for OpenAI, Anthropic, Apple, Amazon, and Meta. This post analyzes what the public reporting reveals about each lab's evaluation methodology — rubric design, RLHF pipelines, and quality control — and what it means for the industry.
The Unverifiable Reward Problem: The Real Frontier of RL for LLMs

7 Mar, 2026 · 11 min read

Deep research on tasks with unverifiable rewards in RL — the key bottleneck for scaling RL beyond math and code. Covers JEPO, NRT, RLNVR, self-play methods, GenRM, Constitutional AI, reward hacking mitigation, and more.
Adding Ads in LLM/Chatbot: Character Training for Monetization

1 Jan, 2026 · 4 min read

Exploring how to integrate ads in LLMs through character training—making recommendations genuinely helpful rather than annoyingly promotional.
Post-Training Is Not 'One Algorithm': Objective Functions and Implementation Essentials for PPO / DPO / GRPO

30 Dec, 2025 · 12 min read

Reading notes on RLHF covering PPO, DPO, and GRPO—understanding post-training as an engineering pipeline rather than a single algorithm.
RLHF from an Engineering Perspective: PPO, GRPO, DPO, and Tool-Use Implementation

30 Dec, 2025 · 12 min read

A practical engineering guide to RLHF implementation—covering PPO, GRPO, DPO, and tool-use training with code snippets and debugging tips.

Tag: RLHF

The Mercor Breach: What 4TB of Stolen Data Reveals About How Frontier AI Labs Actually Train Models

The Unverifiable Reward Problem: The Real Frontier of RL for LLMs

Adding Ads in LLM/Chatbot: Character Training for Monetization

Post-Training Is Not 'One Algorithm': Objective Functions and Implementation Essentials for PPO / DPO / GRPO

RLHF from an Engineering Perspective: PPO, GRPO, DPO, and Tool-Use Implementation