Detection and Mitigation of Gender Bias in Natural Language Processing

Dawkins, Hillary
Journal Title
Journal ISSN
Volume Title
University of Guelph

This thesis contributes to our collective understanding of how gender bias arises in natural language processing systems, provides new detection and measurement tools, and develops mitigation methods. More specifically, we quantify and reduce bias within pre-trained computational resources, both word embeddings and language models, such that unwanted outcomes produced by the system are mitigated. Unwanted outcomes include any system prediction that is unduly influenced by the presence of gender words or the latent concept of gender in language (e.g. when an NLP system is unable to predict that ``she" refers to a doctor).

On the theme of detection, we make two new observations on how gender bias can manifest in system predictions. Firstly, gender words are shown to carry either marked or default values. Default values may pass through systems undetected, while marked values influence prediction outcomes. Secondly, unwanted latent inferences are detected, due to a shared gender association. We contribute two new test sets, and one enhanced test set, for the purpose of gender bias detection.

On the theme of mitigation, we develop successful debiasing strategies applied to both types of pre-trained resources.

Natural language processing, Artificial Intelligence, Gender bias, Word embeddings, Marked attribute bias, Latent gender bias, Pre-trained language models