Predicts per-request MoE expert footprints from prefill activations and routes decode requests to workers that maximize expert-locality, lowering decode latency by combining offline K-means partitioning with online locality-band routing and a KV-block–coindexed signature cache.