vLLM Blog

Model Runner V2: A Modular and Faster Core for vLLM

Mar 24, 2026·8 min read

How Model Runner V2 reworks vLLM's execution core with modular model logic, GPU-native input preparation, stable persistent batching, async-first scheduling, and no API changes.

#engineering

Model Runner V2: A Modular and Faster Core for vLLM