Binary and Extreme Quantization for Computer Vision

ICCV 2025 Workshop

Covering the latest development of novel methodologies for Extreme Quantization, Binary Neural Networks and their application to Computer Vision. Bringing together a diverse group of researchers working in several related areas.

Workshop Description

The pervasive deployment of deep learning models on resource-constrained devices necessitates a critical focus on model compactness, computational efficiency, and power consumption. A key pathway to achieving these goals lies in low-bit quantization, a technique that represents model weights and/or activations with a reduced number of bits (e.g., 2, 4 bits) instead of the standard 32-bit floating point. This significantly reduces model size and computational demands, enabling efficient on-device inference. While low-bit quantization offers immense potential for model compression and accelerated computation, a central challenge remains: how to train and deploy these models while maintaining accuracy comparable to their full-precision counterparts. Recent advancements have demonstrated the feasibility of achieving highly accurate low-bit quantized networks, opening new avenues for their application across diverse domains. This workshop aims to bring together a diverse group of researchers and practitioners from academia and industry to discuss the latest advancements, identify open problems, and foster collaborations in the exciting and rapidly evolving field of efficient deep learning through low-bit quantization. We invite submissions and presentations on novel algorithms, theoretical insights, and practical applications related to this critical area of research.

Call for Papers

Authors are welcome to submit full 8-page papers or short 2-page extended abstracts on any of the following topics:

  • Binary and low-bit quantized models for large vision-language models (LVLMs)
  • Post-training quantization for Neural Networks
  • New methodologies and architectures for training low-bit quantized neural networks
  • Applications of low-bit NNs in computer vision (e.g., image classification, segmentation, object detection, 3D and video recognition)
  • Binary and low-bit quantization for generative models (e.g., Diffusion, Visual Autoregressive models)
  • Hardware implementation and on-device deployment of low-bit NNs
  • New methodologies combining quantization with other efficient techniques (e.g. pruning, dynamic modeling etc.)
  • Federated learning with low-bit quantization.
  • On-device learning.

Important Dates

Paper submission deadline: July 3st, 2025 (11:59pm PST)
Decisions: July 11th, 2025 (11:59pm PST)
Camera ready papers due: August 17th, 2025 (11:59pm PST)
Extended abstract submission: August 11th, 2025 (11:59pm PST)
Extended abstract decisions: August 17th, 2025 (11:59pm PST)
Workshop Date: TBD

Submission Guidelines

  • Papers included in ICCV proceedings: Submitted (full 8-page) papers must be formatted using the ICCV 2025 template and should adhere to ICCV submission guidelines. The maximum file size for submissions is 50MB. The OpenReview-based review process will be double-blind. These submissions will be included in the proceedings and must contain new previously unpublished material.
  • Extended abstracts NOT included in ICCV proceedings: We encourage the submission of extended abstracts (2 pages plus references) that summarize previously published or unpublished work. Extended abstracts will undergo a light single-blind review process. Please use the standard ICCV template, adjusting only the length.
    • Previously published work: We welcome previously published papers from previous CV/ML conferences including ICCV 2025 which are within the scope of the workshop.
    • Unpublished work: We also encourage the submission of papers that summarize work in-progress. The idea of this type of submission is the dissemination of preliminary results or methods that fall within the overall scope of the workshop.

Please upload submissions at: link

Schedule

The Workshop will take place on the 20th of October according to the following schedule. All times are in GMT-10 (Hawaii Standard Time).

8:15 - 8:20 Opening remarks and workshop kickoff
8:20 - 8:50 Invited talk: Amir Gholami - XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization
Large language models power many applications today, but their deployment is increasingly limited by memory bottlenecks rather than computation. While GPU compute performance continues to grow rapidly, memory capacity and bandwidth improvements lag behind, creating a “memory wall” that slows down inference. In this talk, I will introduce XQuant, a new approach designed to break through this barrier. Rather than storing large key–value caches, XQuant quantizes and stores compact input activations, and then reconstructs the key–value states on-the-fly. This shift yields substantial memory savings—up to 7–12× compared to standard FP16 baselines—while retaining near-original model accuracy. Building further, I will discuss XQuant-CL, which leverages the surprising similarity of activations across layers to push memory compression even further. Together, these techniques demonstrate a forward-looking path: by trading a modest increase in computation for dramatic reductions in memory, we can make LLM inference far more efficient and scalable.
8:50 - 9:20 Invited talk: Mohammad Rastegari - TBD
[TBD]
9:20 - 9:30 Oral presentation
9:30 - 9:40 Oral presentation
9:40 - 10:50 Poster Session
10:50 - 11:20 Invited talk: Song Han - TBD
[TBD]
11:20 - 11:50 Invited Talk: Raghu Krishnamoorthi - TBD
[TBD]
11:50 - 12:00 Oral presentation
12:00 - 12:05 Closing remarks and Conclusions

Accepted papers

Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models Keda TAO, Haoxuan You, Yang Sui, Can Qin, Huan Wang [Download]
Binary SqueezeNet: Enhancing Parameter Efficiency in Binary Neural Networks Salih Atabey, Erdem Akagündüz [Download]
Ultra-Efficient and Effective LLMs with Multi-Boolean Architectures Ba-Hien Tran, Van Minh Nguyen [Download]
PREFILT: Prefiltering for Fully Quantized Image Restoration Neural Networks Denis Makhov, Ruslan Ostapets, Irina Zhelavskaya, Dehua Song [Download]
Mitigating GELU Quantization Errors via Activation Distribution Shaping in Vision Transformer Wakayama Hiroyuki, Naoki Okamoto, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi [Download]
Exploiting Information Redundancy in Attention Maps for Extreme Quantization of Vision Transformers Lucas Maisonnave, Karim Haroun, Tom Pegeot [Download]
MoPEQ: Mixture of Mixed Precision Quantized Experts Krishna Teja Chitty-Venkata, Jie Ye, Murali Emani [Download]
PTQAT: A Hybrid Parameter-Efficient Quantization Algorithm for 3D Perception Tasks Xinhao Wang, Zhiwei Lin, Zhongyu Xia, Yongtao Wang [Download]
Enhancing Generalization in Data-free Quantization via Mixup-class Prompting Jiwoong Park, Chaeun Lee, Yongseok Choi, Sein Park, Deokki Hong, Jungwook Choi [Download]
HC‑PTQ: Poincaré‑Based Hyperbolic Clustering for Data‑Free Quantization of Vision Transformers Raffaele Mineo, Simone Palazzo, Concetto Spampinato, Francesco Rundo [Download]
Extreme Compression of Adaptive Neural Images Leo Hoshikawa, Marcos V. Conde, Takeshi Ohashi, Atsushi Irie [Download]
Certifying Robustness of Binary Neural Networks Using Sparse Polynomial Optimization Jianting Yang, Srecko Durasinovic, Victor Magron, Jean B. Lasserre, Jun Zhao [Download]
MSQ: Memory-Efficient Bit Sparsification Quantization Seokho Han, Seo Yeon Yoon, Jinhee Kim, Dongwei Wang, Kang Eun Jeon, Huanrui Yang, Jong Hwan Ko [Download]
Gradient-Free Training of Quantized Neural Networks Noa Cohen, Dotan Di Castro, Omkar Joglekar, Shir Kozlovsky, Vladimir Tchuiev, Michal Moshkovitz [Download]
VAR-Q: Tuning-free Quantized KV Caching for Visual Autoregressive Models Boxun Xu, Zihu Wang, Yu Wang, Zirui Liu, Peng Li [Download]
42 Million FPS on CIFAR-10 with Convolutional Differentiable Logic Gate Networks Felix Petersen, Hilde Kuehne, Christian Borgelt, Julian Welzel, Stefano Ermon [Download]
Probabilistic dynamic quantization for memory constrained devices Gabriele Santini, Francesco Paissan, Elisabetta Farella [Download]

Organizers

Adrian Bulat

Samsung AI

Zechun Liu

Meta Reality Labs

Nic Lane

University of Cambridge and Flower Labs

Georgios Tzimiropoulos

QMUL and Samsung AI

Supported by

RAIDO

Previous editions

CVPR 2021

ICCV 2023