LLM Agents

Tool use, agent runtime design, evaluation, context, and production patterns for systems that act across tools and environments.

Focus Areas

tool selection and orchestration
agent evaluation and trajectory design
context reuse and experience-augmented systems
secure agent deployment

Recommended Posts

Reproducing CompactRL: What Worked, What Failed, and Why We Did Not Scale

Updated: 24 Jul, 2026 · 16 min read

An auditable CompactRL reproduction spanning the public algorithm, a 96-step long-horizon simulation, integration with slime, real Qwen actor-critic training, value-function fixes, 17 experimental phases, and the evidence that stopped us from scaling.
From Long CoT to Agent Swarms: The Documented Evolution of Kimi's Reinforcement Learning

18 Jul, 2026 · 15 min read

A source-grounded history of Kimi's reinforcement-learning stack, from Kimi k1.5's long-context outcome RL and partial rollouts to K2's general RL and K2.5's multimodal GRMs and Parallel-Agent RL.
Training the Critic Without Crashing the Reward: A Practical Guide to Agentic RL

12 Jul, 2026 · 20 min read

A practical framework for critic training and credit assignment in long-horizon LLM agents: IQL, pairwise advantage, hindsight and counterfactual critics, privileged information, turn-level MDPs, chain-of-thought monitoring, and reward-crash diagnosis.
Scaling RL for White-Collar Work: The Environment Foundry

3 Jul, 2026 · 20 min read

A practical framework for turning common white-collar workflows into RL environments: spreadsheets, CRM tasks, customer support, web research, dashboards, and other software-mediated work.
How to Arbitrarily Increase the Difficulty of Agent Evaluation Sets

28 May, 2026 · 18 min read

A practical framework for making agent benchmarks harder in a controlled way: treat difficulty as trajectory-graph complexity, not prompt wording. Covers deterministic scoring, capability facets, harness effects, and systematic data generation.
Improving LLM Internationalization: Bridging the Gap in Tool Use and Agency

10 Mar, 2026 · 17 min read

LLMs achieve 57% tool-calling accuracy in English but only 34% across 52 languages — and 6.8% for the worst. This post covers the full playbook for closing the multilingual gap: training-time techniques, agentic architecture patterns, failure mode analysis, and RL-based approaches for i18n.
Experience-Augmented In-Context Learning: A Training-Free Complement to RL Post-Training

28 Feb, 2026 · 23 min read

RL post-training makes models smarter, but it can't cover the infinite long tail of real-world cases. Experience-augmented ICL retrieves successful reasoning traces at inference time, letting agents learn continuously from real usage — no retraining required.
Tool Selection Optimization for LLM Agents at Scale

9 Jan, 2026 · 18 min read

A deep technical dive into tool selection—retrieval strategies, context optimization, learned selection, and the engineering trade-offs that matter when scaling to hundreds of tools.

LLM Agents

Focus Areas

Recommended Posts

Reproducing CompactRL: What Worked, What Failed, and Why We Did Not Scale

From Long CoT to Agent Swarms: The Documented Evolution of Kimi's Reinforcement Learning

Training the Critic Without Crashing the Reward: A Practical Guide to Agentic RL

Scaling RL for White-Collar Work: The Environment Foundry

How to Arbitrarily Increase the Difficulty of Agent Evaluation Sets

Improving LLM Internationalization: Bridging the Gap in Tool Use and Agency

Experience-Augmented In-Context Learning: A Training-Free Complement to RL Post-Training

Tool Selection Optimization for LLM Agents at Scale