vLLM Blog

Accelerating Laguna XS.2 Inference with vLLM, Speculators, and LLM Compressor

May 28, 2026·3 min read

How Laguna XS.2 is served and optimized in vLLM using first-class model integration, a DFlash speculator trained with Speculators, and FP8, NVFP4, INT4, and INT8 checkpoints from LLM Compressor.

#llm-compressor

Accelerating Laguna XS.2 Inference with vLLM, Speculators, and LLM Compressor