vLLM Blog

vLLM on the DGX Spark: Architecture, Configuration, and Local Evaluation

Jun 1, 2026·16 min read

How to run vLLM on NVIDIA DGX Spark and GB10 systems, including unified memory behavior, NVFP4 Nemotron-3-Super serving, Docker deployment, Prometheus metrics, and local evaluation results.

#nemotron

vLLM on the DGX Spark: Architecture, Configuration, and Local Evaluation