Main content

A Novel Automatic Variable Ranking and Selection Algorithm for Severely Imbalanced Big Binary Data

Show full item record

Title: A Novel Automatic Variable Ranking and Selection Algorithm for Severely Imbalanced Big Binary Data
Author: Jabri, Mehdi-Abderrahman
Department: Department of Mathematics and Statistics
Program: Mathematics and Statistics
Advisor: Nadeem, Khurram
Abstract: This thesis develops a novel automatic variable ranking and selection algorithm for regularised ordinary logistic regression (OLR) models in the presence of severe class-imbalance and potentially involving large scale datasets. We also consider the possibility of strong correlation among a subset of signal and noise covariates. Our algorithm utilizes an ensemble of regularised OLR model fits, such as the Least Absolute Shrinkage and Selection Operator (LASSO), the two-stage Adaptive Lasso, and Ridge Regression, to obtain stable variable rankings. The algorithm also considers three automatic selection methods employed to recover a set of influential variables using derived rank scores from an ensemble of model fits. The simulation study results showed that our algorithm is robust against severe class-imbalance under the presence of highly correlated covariates, and consistently obtained stable variable rankings and each automatic selection method recovered high proportions of signal covariates whilst filtering out noise. We exemplify our methodology using a large volume of severely imbalanced high-dimensional wildland fire data, demonstrating the value of our methodology, which can also be used in other areas of application such as genomics and fraud detection.
URI: https://hdl.handle.net/10214/26280
Date: 2021-08
Terms of Use: All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.


Files in this item

Files Size Format View
Jabri_Mehdi_202108_MSc.pdf 968.8Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record