Main content

A Novel Automatic Variable Ranking and Selection Algorithm for Severely Imbalanced Big Binary Data

Show simple item record

dc.contributor.advisor Nadeem, Khurram Jabri, Mehdi-Abderrahman 2021-08-18T18:59:31Z 2021-08-18T18:59:31Z 2021-08 2021-08-12
dc.description.abstract This thesis develops a novel automatic variable ranking and selection algorithm for regularised ordinary logistic regression (OLR) models in the presence of severe class-imbalance and potentially involving large scale datasets. We also consider the possibility of strong correlation among a subset of signal and noise covariates. Our algorithm utilizes an ensemble of regularised OLR model fits, such as the Least Absolute Shrinkage and Selection Operator (LASSO), the two-stage Adaptive Lasso, and Ridge Regression, to obtain stable variable rankings. The algorithm also considers three automatic selection methods employed to recover a set of influential variables using derived rank scores from an ensemble of model fits. The simulation study results showed that our algorithm is robust against severe class-imbalance under the presence of highly correlated covariates, and consistently obtained stable variable rankings and each automatic selection method recovered high proportions of signal covariates whilst filtering out noise. We exemplify our methodology using a large volume of severely imbalanced high-dimensional wildland fire data, demonstrating the value of our methodology, which can also be used in other areas of application such as genomics and fraud detection. en_US
dc.language.iso en en_US
dc.publisher University of Guelph en_US
dc.subject Ordinary logistic regression en_US
dc.subject Automatic variable ranking en_US
dc.subject Large scale datasets en_US
dc.subject Least Absolute Shrinkage and Selection Operator en_US
dc.subject Selection Algorithm en_US
dc.title A Novel Automatic Variable Ranking and Selection Algorithm for Severely Imbalanced Big Binary Data en_US
dc.type Thesis en_US Mathematics and Statistics en_US Master of Science en_US Department of Mathematics and Statistics en_US
dc.rights.license All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated. University of Guelph en_US

Files in this item

Files Size Format View
Jabri_Mehdi_202108_MSc.pdf 968.8Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record