2 posts · updated may 2026

// writing
Putting a model in charge of KV-cache memory Building an RL environment to learn how KV-cache eviction works in LLM serving systems Speeding up diffusion models with first block caching How to speed up diffusion inference with minimal quality loss using first block caching