Covering the latest development of novel methodologies for Binary Neural Networks and their application to Computer Vision. Bringing together a diverse group of researchers working in several related areas.
Authors are welcome to submit full 8-page papers or short 2-page extended abstracts on any of the following topics:
|Paper submission deadline:|
|Camera ready papers due:|
|Extended abstract submission:|
|Extended abstract decisions:|
|Workshop Date:||June 25th, 2021|
Please upload submissions at: cmt
The Workshop will take place on the 25th of June according to the following schedule. All times are in BST (UTC+1).
|20:00 - 20:10||Opening remarks and workshop kickoff|
|20:10 - 20:40||Invited talk: Daniel Soudry - On depth and data limitations with extreme quantization
We examine three aspects of quantized neural nets:
|20:40 - 21:10||Invited talk: Nicholas Lane - What is Next for the Efficient Machine Learning Revolution?
Mobile and embedded devices increasingly rely on deep neural networks to understand the world -- a formerly impossible feat that would have overwhelmed their system resources just a few years ago. The age of on-device artificial intelligence is upon us; but incredibly, these dramatic changes are just the beginning. Looking ahead, mobile machine learning will extend beyond just classifying categories and perceptual tasks, to roles that alter how every part of the systems stack of smart devices function. This evolutionary step in constrained-resource computing will finally produce devices that meet our expectations in how they can learn, reason and react to the real-world. In this talk, I will briefly discuss the initial breakthroughs that allowed us to reach this point, and outline the next set of open problems we must overcome to bring about this next deep transformation of mobile and embedded computing.
|21:10 - 21:15||Short Break|
|21:15 - 21:30||Oral talk 1|
|21:30 - 21:45||Oral talk 2|
|21:45 - 22:15||Invited talk: Diana Marculescu - miliJoules for 1000 Inferences: Machine Learning Systems ‘on the Cheap’
Machine learning (ML) applications have entered and impacted our lives unlike any other technology advance from the recent past. While the holy grail for judging the quality of a ML model has largely been serving accuracy, and only recently its resource usage, neither of these metrics translate directly to energy efficiency, runtime, or mobile device battery lifetime. This talk uncovers the need for designing efficient convolutional neural networks (CNNs) for deep learning mobile applications that operate under stringent energy and latency constraints. We show that while CNN model quantization and pruning are effective tools in bringing down the model size and resulting energy cost by up to 1000x while maintaining baseline accuracy, the interplay between bitwidth, channel count, and CNN memory footprint uncovers a non-trivial trade-off. Surprisingly, there exists a single weight bitwidth that is superior to others for a given storage constraint, even outperforming mixed-precision quantization. Our results show that even when the channel count is allowed to change, a single weight bitwidth can be sufficient for model compression, which greatly reduces the software and hardware optimization costs for CNN-based ML systems.
|22:15 - 22:30||Break|
|22:30 - 23:00||Invited Talk: Tim de Bruin - BNNs for TinyML: performance beyond accuracy
Over the past few years, there has been a lot of exciting progress in the field of Binary Neural Networks. New training methods and network architectures have enabled rapid increases in accuracy, especially on traditional computer vision benchmarks such as ImageNet -- closing the gap to higher bit-width models while delivering on the promise of increased inference efficiency. At Plumerai, we are strong believers in BNNs. We think that their reduced memory, energy, and computational needs will be especially relevant in the subfield of TinyML, where they can enable previously infeasible products. However, the TinyML field does bring a unique set of challenges: from the quality of the data coming from the low-cost sensors to extreme constraints on the model architectures imposed by the available hardware. This means that solutions developed for ImageNet do not always generalize to this domain. These challenges also extend beyond simply obtaining a high enough accuracy, as real world performance is often more nuanced; requiring stable predictions and a good understanding of model biases. We demonstrate the effects of binarization within this domain. We start by demonstrating how binary convolutions make networks more sensitive to small changes to their inputs. We then show how changes in network architectures designed to more easily carry gradients during training cause models to pick up on different biases in their training data. We also explain how we combine our own collected data with our tiny BNNs into a tool to look at publicly available datasets, and some of the sampling biases they contain. Finally we make the case for an increased research focus into BNNs in the TinyML domain. Given the need for the strengths of BNNs in this domain, the lower computational cost of experiments and the fact that smaller networks bring some of the remaining challenges of BNNs more clearly in focus, we believe that research into TinyML-BNNs could be especially impactful.
|23:00 - 23:15||Oral talk 3|
|23:15 - 23:30||Oral talk 4|
|23:30 - 23:35||Short Break|
|23:35 - 00:05||Invited Talk: Mohammad Rastegari and Maxwell Horton - Data-Free Model Compression
Efficient method for compressing a trained neural network without using any data is very challenging. Our data-free method requires 14x-450x fewer FLOPs than comparable state-of-the-art methods. We break the problem of data-free network compression into a number of independent layer-wise compressions. We show how to efficiently generate layer-wise training data, and how to precondition the network to maintain accuracy during layer-wise compression. We show state-of-the-art performance on MobileNetV1 for data-free low-bit-width quantization. We also show state-of-the-art performance on data-free pruning of EfficientNet B0 when combining our method with end-to-end generative methods.
|00:05 - 00:10||Closing remarks and Conclusions|