Archives Lloyd-Max quantizer -

TurboQuant Explained: 3-Bit KV Cache at 6× Compression

2026-03-282026-03-28 by Ignacy

Last updated: March 2026 TurboQuant is a vector quantization algorithm from Google Research (ICLR 2026) that compresses LLM key-value caches to 3 bits per coordinate with zero accuracy loss. It combines PolarQuant — a rotation-based coordinate transform — with a 1-bit QJL residual correction, achieving at least 6× memory reduction and up to 8× faster … Read more