Main content

A platform upgrade will be performed on the Atrium Institutional Repository from Monday, July 13 to Wenesday, July 15, 2020 (inclusive). During this time, users will not be able to submit new items to the Atrium. Users will still be able to browse, view, and download items that are already available in the Atrium. We apologize for any inconvenience this may cause.

A comparison of machine learning classifiers for use on historical record linkage

Show full item record

Title: A comparison of machine learning classifiers for use on historical record linkage
Author: Kaur, Pavneet
Department: School of Computer Science
Program: Computer Science
Advisor: Antonie, Dr. Luiza
Abstract: Record Linkage is the process of identifying the same entities in one or more data sources in the absence of unique identifiers. Longitudinal data constructed by linking two or more historical sources can provide us with valuable information about the characteristics of population change over time. However, the construction of such longitudinal data is challenged by the unavailability of personal identifiers. In this thesis, we link the Canadian censuses 1871 and 1881 using different methods and conditions. The Support Vector Machine and the Random Forest Classifiers are used to establish the record linkage system. The performance of these different methods is compared to investigate the upper bound achieved in the linkage rate and the conditions which provided us with that rate are inspected. Experiments show that the Random Forest classification explored in this thesis improves upon the linkage rate by 3.6% while maintaining a false positive rate no greater than 5%.
Date: 2020-05
Terms of Use: All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.

Files in this item

Files Size Format View Description
Kaur_Pavneet_202005_Msc.pdf 2.503Mb PDF View/Open Master's Thesis

This item appears in the following Collection(s)

Show full item record