Skip to content

Evaluation

Practical approaches to measuring model and agent capability with deterministic checks, rubrics, trajectories, and verifiable outcomes.

Focus Areas

Recommended Posts