Deep Information Compression for Robust Computer Vision

Journal Title
Journal ISSN
Volume Title
University of Guelph

Deep Neural Networks (DNNs) are a modeling technique capable of automatically extracting complex patterns from data, key to the modern practice of Deep Learning (DL). It is commonly held among DL practitioners that larger models are more accurate. Developing large DNNs, however, requires large-scale computing resources that are ill-suited for embedded applications that must balance accuracy, energy consumption, and other practical constraints. Large DNNs have also been shown to be sensitive to small input perturbations and idiosyncrasies of the data collection procedure. In other words, although large DNNs achieve high predictive accuracy for datasets collected in carefully controlled settings, they often lack robustness to subtle discrepancies between these datasets and real world data generating processes. This lack of robustness poses security and reliability risks for highstakes Artificial Intelligence (AI) applications in which DNNs are increasingly used. This thesis by articles addresses two open questions in DL: i) Why do large DNNs generalize so well on standard benchmarks despite classical statistics predicting poor performance for these overparameterized models? ii) Why do these same DNNs lack robustness to small input perturbations, despite their excellent, even “superhuman” benchmark performance? The thesis addresses these questions with an emphasis on computer vision tasks, for which DL has become a dominant paradigm. The common thread between the four articles is the notion of compression, which I suggest is crucial for robust DNN predictions. The first article shows that dataset complexity may be more predictive of generalization than DNN complexity. Experiments on infinite-width DNNs show that information compression serves as a useful generalization bound. The bound is more aligned with robustness than standard generalization, which may be attributed to the presence of non-robust features or “shortcuts” in common Machine Learning (ML) datasets which make standard generalization possible without needing to learn task-relevant information. The second article investigates the robustness of a practical, hardware-inspired DNN compression procedure: Binarization of DNN weights and activations. Whereas previous work was concerned with efficiently training binarized networks to be accurate, this work shows that they may have improved test-time robustness to input-perturbations. The third article shows that the Batch Normalization (BN) layer commonly used to accelerate training degrades the robustness of state-of-the-art DNNs. In a simplified setting, BN reduces the distance between data points and the classification boundary, which can be interpreted as destroying task-relevant information or excessive compression. It is beneficial if theoretical principles devised on academic datasets are industry useful. To test this, a DNN-based method new to the field of ecology was devised in the final article. Here, input compression bounds the regression error in the performance of invasive species biomass estimation from less than two-hundred training samples. The bound improves confidence in the ecological method and shows broad practical benefits of compression.

deep learning, neural network, computer vision, information bottleneck, mutual information, compression
Angus Galloway, Anna Golubeva, Mahmoud Salem, Mihai Nica, Yani Ioannou, and Graham W. Taylor. Bounding generalization error with input compression: An empirical study with infinite-width networks. Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URL
Angus Galloway, Dominique Brunet, Reza Valipour, Megan McCusker, Johann Biberhofer, Magdalena Sobol, Medhat Moussa, and Graham W. Taylor. Predicting Dreissenid Mussel Abundance in Nearshore Waters using Underwater Imagery and Deep Learning. Limnology and Oceanography: Methods, 20(4):233�??248, 2022. URL lom3.10483
Angus Galloway, Anna Golubeva, Thomas Tanay, Medhat Moussa, and Graham W. Taylor. Batch normalization is a cause of adversarial vulnerability. In ICML 2019 Workshop on Identifying and Understanding Deep Learning Phenomena, 2019. URL https: //
Angus Galloway, Graham W. Taylor, and Medhat Moussa. Attacking binarized neural networks. In International Conference on Learning Representations, 2018. URL https: //