Skip to content

Post-Training

SFT, RLHF, preference optimization, instruction following, reasoning traces, and data pipelines for shaping model behavior after pretraining.

Focus Areas

Recommended Posts