In this post, I would like to summarise the various works that have been presented at EMNLP 2020 that explicitly tackle the problem of attacking gender bias (apart from works that work on removing any type of bias).
Background
Ethics and Fairness in Artificial Intelligence has become a hot topic in research in the last few years. In a premier venue like EMNLP 2020, I was interested to know how the algorithms address the sociological aspects of gender bias in natural language processing (NLP).
Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning
This work focusses on reducing gender bias in dialogue generation systems. The authors identify that there are three major sources of bias — identity of the speaker, identity of the “answerer” and the identity of the actors in the content. The spotlight is on the content here. It is illustrated succinctly in the following example.
When the message is “Really wishes he could take at least one step on this husker floor…”, the reply is “I’m sure he’s going to be a great guest.” Changing the pronoun gets this rather skewed response — “Really wishes she could take at least one step on this husker floor…” response is “I’m sure she’s a little jealous.”
The idea has been realised through adversarial networks. Adversarial networks is an unsupervised model with two modules — generative and discriminative. The generative models, as the name suggests, automatically generates a vector from the model domain. The discriminative model differentiates between the generated example and the real examples.
There are two steps involved in this work — building discriminators that encapsulate gender and semantic information, as well as building a dialogue model that provides debiased chats.
In this context, previous literature provides a quantitative way of removing tell-tale gendered words (based on career, family, extremes of positive and negative emotion etc). This produces a set of ungendered utterances. The discriminator classifies a piece of text into its underlying genders (in this work, male and female). The representation of the words during dialogue generation is improved by incorporating these adversarial examples as well. By programming the losses, such that the dialogues generated are gender neutral, the bias is greatly reduced. These representations are then fed into the dialogue model on a supervised dataset.
How this work differs from existing literature is that, the generated representations do not simply subtract the gender information from the responses, but retain positive gender traits and remove only the biased components.
A successful example of debiasing is demonstrated below.
He ain’t cooking, that’s the problem! — I know right?
She ain’t cooking, that’s the problem! — I was thinking how much I love her.
The second statement is a monumental improvement from the naïve model’s original prediction of “She’s a b****” 😛
Multi-Dimensional Gender Bias Classification
This work by Facebook AI Research, provides a way of estimating the extent of gender bias in text. This has applications in various language modelling exercises. The approach here also aims to tackle the issues of bias outside the gender binary as well. The central theme is to identify how gender has been constructed, from the perspective of speaker or listener or a fact — described as “FROM/TO/AS”. The paper introduces an annotated dataset, MDGender, which has these forms of gender expressions. Transformers have been used to perform classification. The applications of these classifiers have been demonstrated in generation, bias detection and the study of correlation between offensive content and gender bias.
Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias
This work shifts the focus from English and onto the challenges of languages that contain Type B Reflexivization. The idea is that there are language structures that are gendered and may introduce bias. For example, a non-gendered English sentence — “The surgeon placed the book on the table”, may be translated into a sentence in another language which suggests that the surgeon is male. This is seen in many Indian languages as well. The work proposes an Anti-Bias Challenge dataset that focusses on such constructions and demonstrated with many NLP tasks, namely Natural Language Inference, Coreference Resolution, Machine Translation and Language Modelling.
Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation
This work was a bit of a tough read for me, as it was more in a theoretical realm. From what I understood, the seminal work in this field assumed a linear subspace of word embeddings captured bias. A similar experiment was conducted in a non-linear subspace and concluded that the performance differences across both are negligible, and that the linear assumption holds.
Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation
This provocatively titled work also focusses on mitigating gender bias in dialogue systems, much like the first work. They offer three ways of mitigating gender bias — counterfactual data augmentation, targeted data collection, and bias controlled training.
Concretely defining gender bias in data is hard, because the circular problem of humans being biased and hence data being biased, implies human annotators may also inadvertently introduce bias. Hence, multiple ideas and annotations were aggregated.
Counterfactual augmentation is the simple task of duplicated every gendered sentence with its binary opposite. This tactic might often result in noisy data. The second method is to use targeted data collection. Human annotators are used to generate bias-free dialogues with sufficient care taken for semantics and diversity in dialogues. In the third method, a model is constructed such that it is aware of gender and the usage of gendered words can be controlled.
Unsupervised Discovery of Implicit Gender Bias
While all the above works were in the space of supervised learning, this one is positioned in the unsupervised space. It is unsupervised in the space of bias, but it is, nevertheless, a classification task. The interesting contribution is that the annotation required, is not dependent on a human annotator’s interpretation of bias, rather what is required is a factual information that the addressee of the statement is male or female.
The approach used is again, an adversarial approach. The task is to identify the gender, given input text. The most interesting experiment is the evaluation of these representations on a sexism detection task, though the models were not trained for the same. Since there is significant improvement over the random dataset, there is certainly an exciting scope for further research.
And that wraps up the discussion on how gender bias has been tackled at EMNLP 2020. As can be clearly seen, we have a long way to go, but the journey promises to be highly interesting and impactful.