Data Integration from Multiple Historical Sources to Study Canadian Casualties of WWI

dc.contributor.advisorAntonie, Luiza
dc.contributor.authorGadgil, Harshavardhan of Computer Scienceen_US of Guelphen_US of Scienceen_US Scienceen_US
dc.description.abstractLongitudinal data (data that observe the same entities at different points in time), are of interest to historians and social scientists because they create opportunities to study populations over time. In this thesis, we construct longitudinal data by integrating data from four historical sources to study Canadian casualties of World War I. Due to the unavailability of labeled data for two out of three linkage tasks and our application's low tolerance for false matches, we develop a simple stepwise deterministic strategy to integrate the four datasets. For one of three linkage tasks where labeled data are available, we compare the strategy with linkage that incorporates a Support Vector Machine. With the longitudinal dataset constructed, we demonstrate its utility by performing a multivariate regression analysis to determine the factors that influenced a Canadian soldier's likelihood of survival in World War I. The findings of this research indicate that a cautious stepwise deterministic strategy that incorporates approximate comparisons and domain knowledge, can perform on par with a linkage approach that incorporates a supervised learning algorithm, without requiring labeled data. The regression analysis reveals several fascinating patterns of historical importance in early 19th century Canada, demanding further historical investigation.en_US
dc.publisherUniversity of Guelphen_US
dc.rightsAttribution-NonCommercial 2.5 Canada*
dc.subjectrecord linkageen_US
dc.subjectdata integrationen_US
dc.subjecthistorical dataen_US
dc.subjectlongitudinal dataen_US
dc.titleData Integration from Multiple Historical Sources to Study Canadian Casualties of WWIen_US
Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
2.37 MB
Adobe Portable Document Format
MSc. Thesis