Tag: Systems
All the articles with the tag "Systems".
-
Optimizing Inference for Router Looped Transformers
Updated: ยท 17 min readA research note on serving router looped transformers: why normal KV cache semantics break, what latency data says so far, and how vLLM or SGLang could be adapted with route-template batching and virtual-step KV cache.