The High-Throughput and Memory-Efficient inference and serving engine for LLMs

Easy, fast, and cost-efficient LLM serving for everyone.

Easy

Deploy the widest range of open-source models on any hardware. Includes a drop-in OpenAI-compatible API for instant integration.

Fast

Maximize throughput with PagedAttention. Advanced scheduling and continuous batching ensure peak GPU utilization.

Cost Efficient

Slash inference costs by maximizing hardware efficiency. We make high-performance LLMs affordable and accessible to everyone.

Quick Start

Select your preferences and run the install command. Stable represents the most currently tested and supported version of vLLM. Nightly is available if you want the latest builds.

📦 Requires Python 3.10+. Python 3.12+ recommended.

⚡ We recommend uv for faster and more reliable installation.

🔧 For other platforms, see docs.vllm.ai

🎉 See what's new in

🔍 Find which release contains a PR

Build

StableNightly

Platform

CUDAROCmXPUCPU

Package

Python (uv)PythonDocker

CUDA Version

CUDA 13.0CUDA 12.9

Run this Command:

uv pip install vllm --torch-backend auto

💡 Compatible with all CUDA 13.x versions (13.0 - 13.1) · Troubleshooting

Looking for older versions?

Universal Compatibility

One engine, endless possibilities. Run any model on any hardware.

Hardware

Unified API across platforms

AWSNeuron Accelerator

Intel

Open Models

Latest trending open-source models, optimized & production-ready

DeepSeek

DeepSeek V4DeepSeek V3.2DeepSeek R1

Google

Gemma 4Gemma 3

Got questions?
We're here to help.

Whether you're just getting started or debugging a complex deployment, our community is open to everyone. No question is too basic!

Fast & friendly responses

Active maintainers

Join Slack

Real-time help & discussions

Visit Forum

Searchable Q&A knowledge base

GitHub Issues

Bug reports & feature requests

Resources

Explore recipes, benchmarks, and roadmap

Recipes

Example notebooks and tutorials

recipes.vllm.ai

Performance

Benchmarks and comparisons

perf.vllm.ai

Roadmap

Project roadmap and milestones

roadmap.vllm.ai

The High-Throughput and Memory-Efficient inference and serving engine for LLMs

Easy

Fast

Cost Efficient

Quick Start

Sponsors

Cash Donations

Compute Resources

Slack Sponsor

Universal Compatibility

Hardware

Open Models

Got questions?
We're here to help.

Resources

Recipes

Performance

Roadmap

The High-Throughput and Memory-Efficient inference and serving engine for LLMs

Easy

Fast

Cost Efficient

Quick Start

Sponsors

Cash Donations

Compute Resources

Slack Sponsor

Universal Compatibility

Hardware

Open Models

Got questions?We're here to help.

Resources

Recipes

Performance

Roadmap

Got questions?
We're here to help.