Main content

A Novel Automatic Variable Ranking and Selection Algorithm for Severely Imbalanced Big Binary Data

Show simple item record

dc.contributor.advisor Nadeem, Khurram
dc.contributor.author Jabri, Mehdi-Abderrahman
dc.date.accessioned 2021-08-18T18:59:31Z
dc.date.available 2021-08-18T18:59:31Z
dc.date.copyright 2021-08
dc.date.created 2021-08-12
dc.identifier.uri https://hdl.handle.net/10214/26280
dc.description.abstract This thesis develops a novel automatic variable ranking and selection algorithm for regularised ordinary logistic regression (OLR) models in the presence of severe class-imbalance and potentially involving large scale datasets. We also consider the possibility of strong correlation among a subset of signal and noise covariates. Our algorithm utilizes an ensemble of regularised OLR model fits, such as the Least Absolute Shrinkage and Selection Operator (LASSO), the two-stage Adaptive Lasso, and Ridge Regression, to obtain stable variable rankings. The algorithm also considers three automatic selection methods employed to recover a set of influential variables using derived rank scores from an ensemble of model fits. The simulation study results showed that our algorithm is robust against severe class-imbalance under the presence of highly correlated covariates, and consistently obtained stable variable rankings and each automatic selection method recovered high proportions of signal covariates whilst filtering out noise. We exemplify our methodology using a large volume of severely imbalanced high-dimensional wildland fire data, demonstrating the value of our methodology, which can also be used in other areas of application such as genomics and fraud detection. en_US
dc.language.iso en en_US
dc.publisher University of Guelph en_US
dc.subject Ordinary logistic regression en_US
dc.subject Automatic variable ranking en_US
dc.subject Large scale datasets en_US
dc.subject Least Absolute Shrinkage and Selection Operator en_US
dc.subject Selection Algorithm en_US
dc.title A Novel Automatic Variable Ranking and Selection Algorithm for Severely Imbalanced Big Binary Data en_US
dc.type Thesis en_US
dc.degree.programme Mathematics and Statistics en_US
dc.degree.name Master of Science en_US
dc.degree.department Department of Mathematics and Statistics en_US
dc.rights.license All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.
dc.degree.grantor University of Guelph en_US


Files in this item

Files Size Format View
Jabri_Mehdi_202108_MSc.pdf 968.8Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record