Serves interactive, long-lived streaming video-generation sessions by jointly scheduling session placement and GPU autoscaling to meet tight per-chunk latency. Combines migration-aware placement, load-driven autoscaling, coalesced chunk processing, GPU–CPU offloading and NCCL GPU–GPU migration; reports ~37% reductions in worst-case per-chunk latency and GPU operating cost.