Regularized Regression Methods and Neural Networks for Modeling Fish Population Health with Water Quality Variables in the Athabasca Oil Sands Region
This thesis aims to develop statistical models for fish population health measures including adjusted trout-perch body weight, gonad weight, and liver weight with the use of climate, environmental, and water quality variables measured in the Athabasca River. To identify relevant variables, we considered three variable selection techniques: stepwise regression, the lasso, and the elastic net (EN). The lasso and EN generally produced regression models with better performance for each response. Uranium (U), tungsten, tellurium (Te), pH, molybdenum (Mo), and antimony were found important for at least one response. Uranium, Te, and Mo had relatively large coefficients in both the adjusted gonad and liver weight models suggesting they may be influential on the development of trout-perch organs. Neural networks (NNs) are considered to improve the prediction accuracy of the fish population endpoints. The NNs were found to outperform the regularization techniques in predicting the adjusted body weight, but not the adjusted gonad or liver weights.