
vLLM on the DGX Spark: Architecture, Configuration, and Local Evaluation
·16 min read
How to run vLLM on NVIDIA DGX Spark and GB10 systems, including unified memory behavior, NVFP4 Nemotron-3-Super serving, Docker deployment, Prometheus metrics, and local evaluation results.