
The State of FP8 KV-Cache and Attention Quantization in vLLM
·21 min read
What vLLM FP8 KV-cache validation found across Hopper and Blackwell, covering attention quantization, Flash Attention 3 fixes, memory savings, decode speedups, and layers to skip.