Enhancing Readability of Privacy Policies Through Ontologies

Thumbnail Image
Audich, Dhiren
Journal Title
Journal ISSN
Volume Title
University of Guelph

Privacy policies operate as memorandums of understanding (MOUs) between the users and providers of online services. Research suggests that users are deterred from reading policies because of their length, difficult language, and insufficient information. Users are more likely to read short excerpts if they immediately addresses their concerns. As a first step in helping users find pertinent information in privacy policies, this thesis presents the development of a domain ontology using natural language processing (NLP) algorithms as a way to reduce costs and speed up development. By using the ontology to locate key parts of privacy policies, average reading times were substantially reduced from 8-12 minutes to 45 seconds. In the process of extracting keywords from the privacy policy corpus, a supervised NLP algorithm performed marginally better (7%) but showed greater promise with larger training sets. Additionally, trained non-domain experts achieved a combined F1-score of 71% when compared to a domain expert, and did so when extracting keywords from fewer policies.

privacy policy, ontology, natural language processing, NLP, artificial intelligence, keyword extraction, RAKE, TF-IDF, HCI, Human Computer Interaction, privacy, law, taxonomy, data privacy, feature extraction, comprehensive taxonomy, TextRank, AlchemyAPI, online privacy