A Comparison of Statistical Models and Deep Learning for Data with Binary Response and Longitudinal Covariates
In statistics, longitudinal data refers to data in which the response variable and explanatory variables are measured several times for each subject. However, in the machine learning literature, longitudinal data can also refer to data in which only the explanatory variables are repeatedly measured, but not the response variable. This thesis compared two statistical models - the baseline logistic regression and the two-stage joint model, and two neural network approaches - the feed-forward neural network and the recurrent neural network with long short-term memory, in terms of the prediction sensitivity, specificity, area under the receiver operating characteristic curve, and Brier score. Data analysis was conducted using data from two clinical trials and a simulation study was also conducted. For the datasets generated and studied in this thesis, the neural network approaches show no advantages compared to the other statistical methods.