LLM Agents
Tool use, agent runtime design, evaluation, context, and production patterns for systems that act across tools and environments.
Focus Areas
- tool selection and orchestration
- agent evaluation and trajectory design
- context reuse and experience-augmented systems
- secure agent deployment
Recommended Posts
-
How to Arbitrarily Increase the Difficulty of Agent Evaluation Sets
· 18 min readA practical framework for making agent benchmarks harder in a controlled way: treat difficulty as trajectory-graph complexity, not prompt wording. Covers deterministic scoring, capability facets, harness effects, and systematic data generation.
-
Improving LLM Internationalization: Bridging the Gap in Tool Use and Agency
· 17 min readLLMs achieve 57% tool-calling accuracy in English but only 34% across 52 languages — and 6.8% for the worst. This post covers the full playbook for closing the multilingual gap: training-time techniques, agentic architecture patterns, failure mode analysis, and RL-based approaches for i18n.
-
Experience-Augmented In-Context Learning: A Training-Free Complement to RL Post-Training
· 23 min readRL post-training makes models smarter, but it can't cover the infinite long tail of real-world cases. Experience-augmented ICL retrieves successful reasoning traces at inference time, letting agents learn continuously from real usage — no retraining required.
-
Tool Selection Optimization for LLM Agents at Scale
· 18 min readA deep technical dive into tool selection—retrieval strategies, context optimization, learned selection, and the engineering trade-offs that matter when scaling to hundreds of tools.