Title:
|
A Novel Automatic Variable Ranking and Selection Algorithm for Severely Imbalanced Big Binary Data |
Author:
|
Jabri, Mehdi-Abderrahman
|
Department:
|
Department of Mathematics and Statistics |
Program:
|
Mathematics and Statistics |
Advisor:
|
Nadeem, Khurram |
Abstract:
|
This thesis develops a novel automatic variable ranking and selection algorithm for regularised ordinary logistic regression (OLR) models in the presence of severe class-imbalance and potentially involving large scale datasets. We also consider the possibility of strong correlation among a subset of signal and noise covariates. Our algorithm utilizes an ensemble of regularised OLR model fits, such as the Least Absolute Shrinkage and Selection Operator (LASSO), the two-stage Adaptive Lasso, and Ridge Regression, to obtain stable variable rankings. The algorithm also considers three automatic selection methods employed to recover a set of influential variables using derived rank scores from an ensemble of model fits. The simulation study results showed that our algorithm is robust against severe class-imbalance under the presence of highly correlated covariates, and consistently obtained stable variable rankings and each automatic selection method recovered high proportions of signal covariates whilst filtering out noise. We exemplify our methodology using a large volume of severely imbalanced high-dimensional wildland fire data, demonstrating the value of our methodology, which can also be used in other areas of application such as genomics and fraud detection. |
URI:
|
https://hdl.handle.net/10214/26280
|
Date:
|
2021-08 |
Terms of Use:
|
All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated. |