Evaluation of robust regression on sampling estimation from a mixture of finite bivariate populations
In this thesis, we examine regression methods in a sample survey setting when the population is sampled from a mixture of bivariate normal distributions. We examine the estimation of the population total of a response variable ' y' in the presence of a covariate 'x,' where the total of 'x' is known for the population. Under normal conditions, Ordinary Least Squares ('OLS') is considered a good procedure. However, 'OLS' regression is not necessarily the best choice when the population is sampled from a mixture of bivariate normal distributions. We examine whether robust regression ('RR') is an effective alternative in this case. We simulate data from one or two bivariate normal populations. The second population will differ from the first by a shift in either the first or second measured variable. The size of that shift and the proportion of the population from the second distribution will be varied. This thesis examines whether 'RR' is a worthwhile alternative to 'OLS' in this situation. The 'RR' methodologies being examined are Least Median Squares, Least Trimmed Mean Squares, M-estimation, and MM-estimation. These methodologies are being compared versus the ' OLS' Regression. The main measures being applied to make the evaluation are the Mean Squared Error, the Bias, the Estimated Variance, and the Empirical Variance. This last measure can be calculated given the nature of the simulated data. The estimated variance for robust regression in the sampling context has not yet appeared in the literature. In this thesis, we are proposing this estimation as well and we are comparing its results with the empirical variance to see how good the estimated variance approximation is.