[3], an online vector quantization method, drew wide public attention at ICLR 2026. For me, it looked very familiar: it overlaps heavily with EDEN, a quantization method first introduced as the 1-bit method DRIVE at NeurIPS 2021 [1] and generalized to arbitrary bit-widths at ICML 2022 [2]. Co-authored by myself, with Ran Ben-Basat, Yaniv Ben-Itzhak, Gal Mendelson, Michael Mitzenmacher, and Shay Vargaftik.
The TurboQuant paper presents two variants: TurboQuant-mse and TurboQuant-prod. In a detailed new comparison [5] we show that TurboQuant-mse is a degenerate case of EDEN, and that the EDEN variants consistently outperform their counterparts.
How EDEN quantizes a vector
Suppose you need to compress a dd-dimensional vector xx (a gradient update, an embedding, a KV-cache entry) down to a few bits per coordinate. EDEN proceeds in four steps:
- Random rotation — Multiply by a random orthogonal matrix Π\Pi. After rotation the coordinates are identically distributed and, for large dd, approximately Gaussian.
- Scalar quantization — Round each rotated coordinate to one of 2b2^b levels from a Lloyd–Max codebook trained on the known rotated coordinate distribution (bb is the target number of bits per coordinate).
- Scale — Multiply by a scale factor SS.
- Inverse rotation — Apply Π⊤\Pi^\top to recover an approximation x^\hat{x} of the original vector.
While earlier works (e.g., Suresh et al. (2017) [6]) used rotation mainly to shrink the coordinates’ dynamic range (the gap between the largest and smallest coordinate value), EDEN [1] was, to the best of our knowledge, the first quantization scheme to exploit a stronger fact about random rotation: the post-rotation coordinates follow a known distribution, which lets us use a deterministic quantizer paired with a closed-form scale that, depending on the application, either minimizes MSE or makes the estimate unbiased. Both scales are derived analytically, and the construction yields an asymptotic MSE reduction over the previous approach.
Concretely, EDEN’s two variants differ only in the choice of SS:
- EDEN-biased — sets SS to the closed-form value that minimizes the reconstruction MSE.
- EDEN-unbiased — chooses SS so the decompressed output is correct on average (𝔼[x^]=x\mathbb{E}[\hat{x}] = x), which matters particularly whenever you average many quantized vectors (e.g., distributed training, attention).
Lined up against EDEN, TurboQuant-mse matches at every step except one: where EDEN derives the scale SS analytically, TurboQuant-mse, although it targets MSE minimization, skips the optimized scaling.
The pseudocode below shows the three side by side.

Why the optimal scale is worth it
The value of applying proper scale SS grows with bit-width. At b=1b = 1 bit, the gap is marginal. At d=128d = 128 and b=4b = 4 bits, EDEN-biased reduces MSE by 2.25% over TurboQuant-mse, and these are the bit-widths practitioners actually use for embeddings and KV caches.
Across dimensions 16 to 4096 and all tested bit-widths b∈{1,2,3,4}b \in \{1,2,3,4\}, EDEN-biased vNMSE (vector-normalized MSE, 𝔼[‖x−x^‖2]/‖x‖2\mathbb{E}[\|x – \hat{x}\|^2] / \|x\|^2) falls below TurboQuant-mse’s in every case (Figure 2). As dimension grows very large, the optimal SS approaches 1 and the two algorithms converge, but at practical dimensions (128–1024), the gap persists.

Unbiased compression: saving more than a full bit
The results above concern the biased (MSE-minimizing) variants. Now consider the unbiased case, where applications such as distributed training, approximate attention, or inner-product retrieval need 𝔼[x^]=x\mathbb{E}[\hat{x}] = x because they average many quantized vectors.
EDEN-unbiased uses the same single-pass algorithm as EDEN-biased, just with SS chosen for bias correction. TurboQuant’s unbiased variant, TurboQuant-prod, takes a different route: it spends (b−1)(b-1) bits on the biased TurboQuant-mse step and reserves 1 bit for a QJL (Quantized Johnson–Lindenstrauss) [4] correction on the residual (QJL is similar to EDEN at b=1b=1, but with higher variance).
EDEN-unbiased outperforms TurboQuant-prod in every tested configuration, and by a substantial margin. The gap traces to three structural advantages of EDEN’s single-pass design:
- EDEN optimizes the scale. TurboQuant-prod inherits TurboQuant-mse’s s=1s=1 first stage, so it carries the same MSE penalty.
- EDEN’s 1-bit construction has lower variance than QJL. In large dimensions, EDEN’s 1-bit vNMSE converges to π/2−1≈0.57\pi/2 – 1 \approx 0.57 [1], while QJL’s converges to π/2≈1.57\pi/2 \approx 1.57 [4], roughly 2.75× higher.
- EDEN spends the full bit budget on a single unbiased quantizer. TurboQuant-prod splits the budget into (b−1)(b-1) biased bits plus 1 residual bit, which empirically underperforms spending all bb bits on a single unbiased quantizer [5].
These effects compound. The result: 1-bit, 2-bit, and 3-bit EDEN-unbiased are each more accurate than 2-bit, 3-bit, and 4-bit TurboQuant-prod, respectively (Figure 3). By swapping in EDEN you can drop a bit per coordinate and still match TurboQuant-prod’s accuracy.

On TurboQuant’s own benchmarks
The same picture holds on the standard ANN benchmarks TurboQuant evaluates on, Stanford’s GloVe pre-trained word vectors (Open Data Commons Public Domain Dedication and License v1.0) and Qdrant’s dbpedia-entities-openai3-text-embedding-3-large embeddings (Apache 2.0), using TurboQuant’s published evaluation code:
EDEN-biased achieves lower MSE than TurboQuant-mse, EDEN-unbiased achieves markedly lower inner-product error than TurboQuant-prod, and nearest-neighbor recall on both datasets favors EDEN (Figure 4).

Takeaway: use EDEN; optimal scaling matters
EDEN’s scale connects the known post-rotation distribution to an analytically optimal quantizer. TurboQuant-mse keeps EDEN’s rotation and the codebook but pins S=1S=1, which is what makes it a strictly weaker special case. TurboQuant-prod adds a 1-bit QJL stage on top of that, where EDEN-unbiased gets the same property, with better accuracy, by just picking a bias-correcting scale.
- For MSE-targeted compression (model weight quantization, nearest-neighbor search, KV cache): EDEN-biased computes the optimal scale SS and consistently beats TurboQuant-mse (which is EDEN with S=1S=1 fixed).
- For unbiased estimation (distributed mean estimation, approximate attention, inner-product retrieval): EDEN-unbiased substantially outperforms TurboQuant-prod’s bit-splitting strategy, by margins worth more than a full bit per coordinate.
EDEN was originally developed for distributed mean estimation in federated and distributed training. Subsequent work has, for example, applied it to embedding compression for document re-ranking (SDR, 2022 [8]), adapted it for NVFP4 LLM training (MS-EDEN in Quartet II, 2026 [10]), generalized it to vector quantization for data-free LLM weight compression (HIGGS, 2025 [9]), which was then used for KV-cache compression (AQUA-KV, 2025 [11]).
EDEN implementations are available: in PyTorch and TensorFlow, in Intel’s OpenFL [7], and its 1-bit variant in Google’s FedJax, TensorFlow Federated, and TensorFlow Model Optimization.
For the full technical comparison analysis with TurboQuant (all figures, detailed experimental methodology), see our note [5].
For the original derivations, proofs, and further extensions, see our original papers [1] [2].
References
- S. Vargaftik, R. Ben-Basat, A. Portnoy, G. Mendelson, Y. Ben-Itzhak, M. Mitzenmacher, DRIVE: One-bit Distributed Mean Estimation (2021), NeurIPS 2021.
- S. Vargaftik, R. Ben-Basat, A. Portnoy, G. Mendelson, Y. Ben-Itzhak, M. Mitzenmacher, EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning (2022), ICML 2022.
- A. Zandieh, M. Daliri, A. Hadian, V. Mirrokni, TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate (2026), ICLR 2026.
- A. Zandieh, M. Daliri, I. Han, QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead (2024), arXiv:2406.03482.
- R. Ben-Basat, Y. Ben-Itzhak, G. Mendelson, M. Mitzenmacher, A. Portnoy, S. Vargaftik, A Note on TurboQuant and the Earlier DRIVE/EDEN Line of Work (2026), arXiv:2604.18555.
- A. T. Suresh, F. X. Yu, S. Kumar, H. B. McMahan, Distributed Mean Estimation with Limited Communication (2017), ICML 2017.
- VMware Open Source Blog, VMware Research Group’s EDEN Becomes Part of OpenFL (November 2022).
- N. Cohen, A. Portnoy, B. Fetahu, A. Ingber, SDR: Efficient Neural Re-ranking using Succinct Document Representation (2022), ACL 2022.
- V. Malinovskii, A. Panferov, I. Ilin, H. Guo, P. Richtárik, D. Alistarh, HIGGS: Pushing the Limits of Large Language Model Quantization via the Linearity Theorem (2025), NAACL 2025.
- A. Panferov, E. Schultheis, S. Tabesh, D. Alistarh, Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation (2026), arXiv:2601.22813.
- A. Shutova, V. Malinovskii, V. Egiazarian, D. Kuznedelev, D. Mazur, N. Surkov, I. Ermakov, D. Alistarh, Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models (2025), ICML 2025.