Main content

Keyword Extraction for Privacy Policy Analysis Using Topic Modelling Approaches

Show simple item record

dc.contributor.advisor Song, Fei
dc.contributor.advisor Dara, Rozita
dc.contributor.author Chen, Sijie
dc.date.accessioned 2021-01-11T21:37:57Z
dc.date.available 2021-01-11T21:37:57Z
dc.date.copyright 2021-01
dc.date.created 2021-01-04
dc.identifier.uri https://hdl.handle.net/10214/23731
dc.description.abstract Privacy policies are official documents that inform users about how their data are collected and used by the service providers. However, such documents are often verbose and full of legal jargons, making it difficult for ordinary users to read and understand them. Our research objective is to develop effective solutions to the extraction of keywords that can help the coverage and relevancy analysis of privacy policies with regards to the related data practices. To this end, we extended two topic models: LDA (Latent Dirichlet Allocation) and POSLDA (Part-of-Speech LDA) with prior information about different data practices and Part-of-Speech classes and compare their performance for the keyword extraction of privacy policies. We used the OPP-115 dataset for the optimization of the topic models and the evaluation of keyword extraction. Our results show that both LDA and POSLDA are capable of extracting quality keywords from privacy policies on various topics, and POSLDA can not only distinguish the POS classes of keywords for different topics, but also improve the accuracy of keyword extraction by removing the stop words customized from the same modelling process. en_US
dc.language.iso en en_US
dc.subject privacy policy en_US
dc.subject keyword extraction en_US
dc.subject topic modelling en_US
dc.title Keyword Extraction for Privacy Policy Analysis Using Topic Modelling Approaches en_US
dc.type Thesis en_US
dc.degree.programme Computer Science en_US
dc.degree.name Master of Science en_US
dc.degree.department School of Computer Science en_US
dc.rights.license All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.


Files in this item

Files Size Format View
Chen_Sijie_202101_MSc.pdf 1.600Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record