Main content

Data Integration from Multiple Historical Sources to Study Canadian Casualties of WWI

Show simple item record

dc.contributor.advisor Antonie, Luiza
dc.contributor.author Gadgil, Harshavardhan
dc.date.accessioned 2017-03-31T14:04:24Z
dc.date.available 2017-03-31T14:04:24Z
dc.date.copyright 2017-03
dc.date.created 2017-02-13
dc.date.issued 2017-03-31
dc.identifier.uri http://hdl.handle.net/10214/10284
dc.description.abstract Longitudinal data (data that observe the same entities at different points in time), are of interest to historians and social scientists because they create opportunities to study populations over time. In this thesis, we construct longitudinal data by integrating data from four historical sources to study Canadian casualties of World War I. Due to the unavailability of labeled data for two out of three linkage tasks and our application's low tolerance for false matches, we develop a simple stepwise deterministic strategy to integrate the four datasets. For one of three linkage tasks where labeled data are available, we compare the strategy with linkage that incorporates a Support Vector Machine. With the longitudinal dataset constructed, we demonstrate its utility by performing a multivariate regression analysis to determine the factors that influenced a Canadian soldier's likelihood of survival in World War I. The findings of this research indicate that a cautious stepwise deterministic strategy that incorporates approximate comparisons and domain knowledge, can perform on par with a linkage approach that incorporates a supervised learning algorithm, without requiring labeled data. The regression analysis reveals several fascinating patterns of historical importance in early 19th century Canada, demanding further historical investigation. en_US
dc.language.iso en en_US
dc.rights Attribution-NonCommercial 2.5 Canada *
dc.rights.uri http://creativecommons.org/licenses/by-nc/2.5/ca/ *
dc.subject record linkage en_US
dc.subject data integration en_US
dc.subject historical data en_US
dc.subject longitudinal data en_US
dc.title Data Integration from Multiple Historical Sources to Study Canadian Casualties of WWI en_US
dc.type Thesis en_US
dc.degree.programme Computer Science en_US
dc.degree.name Master of Science en_US
dc.degree.department School of Computer Science en_US
dc.rights.license All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.


Files in this item

Files Size Format View Description
Gadgil_Harshavardhan_201703_Msc.pdf 2.365Mb PDF View/Open MSc. Thesis

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial 2.5 Canada Except where otherwise noted, this item's license is described as Attribution-NonCommercial 2.5 Canada