Title: | Enhancing Learning Capability of Convolutional Neural Networks for Fundamental Vision Problems |
---|---|
Author: | |
Department: | School of Computer Science |
Program: | Computer Science |
Advisor: | Gong, Minglun |
Abstract: | During the past decade, Convolutional Neural Networks (CNNs) have been long dominating the realm of computer vision and becoming a de facto standard for modern data-driven algorithms. They have exhibited state-of-the-art performances in many vision tasks, thanks to their strong representative capacities. Hence, further enhancing CNNs’ learning capability has very broad impacts. This thesis presents several such enhancements through novel techniques, mechanisms, structures, and learning paradigms. These enhancements are evaluated using two main types of learning tasks, namely classification and regression. The thesis starts with the image classification problem since it triggers the surge of various downstream vision tasks. Inspired by the success of efficient feature reuse in DenseNet and a pool of drop-based stochastic regularization, Stochastic Feature Reuse is presented to boost capacity and generalization of DenseNet through randomly dropping densely reused features from preceding layers. Simultaneously, a Multi-scale Convolution Aggregation module is also explored to facilitate learning scale-invariant representations. Albeit promising, the resulting algorithm still inherits DenseNet’s limitations on the large model scale and superfluous feature reuse. To extract highly discriminative characteristics with more compact models, a Layer-wise Attention condenser is designed to form a strong variant. To study the impacts on regression problems, the second part of the thesis focuses on the crowd counting problem since it expects a single, non-constrained value as output, making it more arduous and representative than other regression tasks. Motivated by the ideas of diverse receptive fields and stochastic regularization, a Stochastic Multi-Scale Aggregation Network is proposed to enrich the scale diversity of feature maps and to combat overfitting. To further cement the capacity of handling drastic scale variations, a Single-column Scale-invariant Network is presented, which extracts sophisticated scale-invariant features via the fabric-like combination of interlayer scale integration and a novel intralayer Scale-invariant Transformation. To dig deep into fine-grained group convolutions and effects of multi-task supervision on network's capacity, an innovative Scale Tree Network is presented to parse scale information hierarchically and efficiently incorporating a tree-like structure. It also proposes a Multi-level Auxiliator to facilitate the recognition of cluttered backgrounds. Finally, a weakly-supervised counting framework, referred to as CrowdMLP, is presented to model global-range receptive fields for regression problems, which is characterized by count-level annotations and multi-granularity MLP. Extensive experiments on widely-used benchmark datasets demonstrate the effectiveness of the proposed strategies or design principles in enhancing learning capabilities for two fundamental vision problems, thereby achieving superior performances in classification and counting accuracy. Ablation studies and visualizations are also performed to shed light on the impacts and behaviours of individual components. |
URI: | https://hdl.handle.net/10214/27111 |
Date: | 2022-08 |
Rights: | Attribution-NonCommercial 4.0 International |
Related Publications: | Wang, Mingjie, et al. "Multi-scale convolution aggregation and stochastic feature reuse for DenseNets." 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2019, 10.1109/WACV.2019.00040. Wang, Mingjie, et al. "ADNet: Adaptively Dense Convolutional Neural Networks." Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020, 10.1109/WACV45572.2020.9093431. Wang, Mingjie, et al. "Stochastic multi-scale aggregation network for crowd counting." ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, 10.1109/ICASSP40776.2020.9054238. Wang, Mingjie, et al. "Interlayer and intralayer scale aggregation for scale-invariant crowd counting." Neurocomputing 441 (2021): 128-137, 10.1016/j.neucom.2021.01.112. Wang, Mingjie, et al. "STNet: Scale Tree Network with Multi-level Auxiliator for Crowd Counting." IEEE Transactions on Multimedia (2022), 10.1109/TMM.2022.3142398. |
Files | Size | Format | View | Description |
---|---|---|---|---|
Wang_Mingjie_202208_PhD.pdf | 53.44Mb |
View/ |
Dissertation |