Main content

Keyword Extraction for Privacy Policy Analysis Using Topic Modelling Approaches

Show full item record

Title: Keyword Extraction for Privacy Policy Analysis Using Topic Modelling Approaches
Author: Chen, Sijie
Department: School of Computer Science
Program: Computer Science
Advisor: Song, FeiDara, Rozita
Abstract: Privacy policies are official documents that inform users about how their data are collected and used by the service providers. However, such documents are often verbose and full of legal jargons, making it difficult for ordinary users to read and understand them. Our research objective is to develop effective solutions to the extraction of keywords that can help the coverage and relevancy analysis of privacy policies with regards to the related data practices. To this end, we extended two topic models: LDA (Latent Dirichlet Allocation) and POSLDA (Part-of-Speech LDA) with prior information about different data practices and Part-of-Speech classes and compare their performance for the keyword extraction of privacy policies. We used the OPP-115 dataset for the optimization of the topic models and the evaluation of keyword extraction. Our results show that both LDA and POSLDA are capable of extracting quality keywords from privacy policies on various topics, and POSLDA can not only distinguish the POS classes of keywords for different topics, but also improve the accuracy of keyword extraction by removing the stop words customized from the same modelling process.
URI: https://hdl.handle.net/10214/23731
Date: 2021-01
Terms of Use: All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.


Files in this item

Files Size Format View
Chen_Sijie_202101_MSc.pdf 1.600Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record