TurboQuant Explained: 3-Bit KV Cache at 6× Compression

Last updated: March 2026 TurboQuant is a vector quantization algorithm from Google Research (ICLR 2026) that compresses LLM key-value caches to 3 bits per coordinate with zero accuracy loss. It combines PolarQuant — a rotation-based coordinate transform — with a 1-bit QJL residual correction, achieving at least 6× memory reduction and up to 8× faster … Read more