Privacy Preserving Data Sanitization and Publishing

dc.contributor.advisorObimbo, Charlie
dc.contributor.authorZaman, A. N. K.
dc.date.accessioned2017-12-18T15:05:53Z
dc.date.available2017-12-18T15:05:53Z
dc.date.copyright2017-12
dc.date.created2017-12-01
dc.date.issued2017-12-18
dc.degree.departmentSchool of Computer Scienceen_US
dc.degree.grantorUniversity of Guelphen_US
dc.degree.nameDoctor of Philosophyen_US
dc.degree.programmeComputer Scienceen_US
dc.description.abstractRecent trends have shown a drastic increase in large data repositories by corporations, governments, and healthcare organizations. According to Bernard Marr of the Forbes Tech magazine (2015), the growth in data in 2014/15 alone was twice that created in the entire history of the human race. Data sharing is beneficial in areas such as healthcare services, and collaborative research works. However, there is a significant risk of compromising sensitive information, for example through de-anonymization. Privacy Preserving Data Publishing (PPDP) is a way to allow one to share sanitized data while ensuring protection against identity disclosure of an individual. Removing explicit identifiers/personally identifiable information (PII) from a data set and making the data set compliant according to the Health Insurance Portability and Accountability Act (HIPAA) does not guarantee the privacy of data donors. Data sanitization may be achieved in different ways, by k-anonymization, l-diversity or delta-presence, to name but a few, however, differential privacy paradigm provides the strongest privacy guarantee for sanitized data publishing. This research proposes two privacy preserving algorithms that satisfy the epsilon-differential privacy requirement and adopts the non-interactive privacy model for sanitizing and publishing data. Along with the differential privacy, generalization and suppression of attributes is applied to impose privacy and to prevent re-identification of records of a data set. The key contributions of this thesis are: 1) the proposed algorithm adopts the non-interactive model for data publishing; as a result data miners have full access to the published data set for further processing, to promote data sharing in a safe way; 2) the algorithm can sanitize micro and/or HIPPA compliance data sets for publishing; 3) the published data is independent of adversary's background knowledge; 4) the algorithm is independent of the choice of quasi-identifiers (QIDs), and finally, 5) it protects published data set from the re-identification risk. The published sanitized data using the proposed algorithm is shown to have higher data usability in the case of data classification accuracy compared to other existing works, and significantly reduces the risk of re-identification.en_US
dc.identifier.urihttp://hdl.handle.net/10214/12092
dc.language.isoenen_US
dc.publisherUniversity of Guelphen_US
dc.rights.licenseAll items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjectData Anonymizationen_US
dc.subjectDifferential Privacyen_US
dc.subjectRe-identification Risken_US
dc.subjectSecure Data Sharingen_US
dc.subjectData Publishingen_US
dc.titlePrivacy Preserving Data Sanitization and Publishingen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zaman_ANK_201712_PhD.pdf
Size:
1.93 MB
Format:
Adobe Portable Document Format
Description: