Counterfactual text with seq2seq

In May 2020, I posted a project where I used spaCy and BERT to “flip” gender in Spanish sentences (un profesor viejo <-> una profesora vieja). This was useful to evaluate models’ biases or augment training data, but it was slow and dependent on hardcoded variables in my script. At the time, I suggested the next step would be using a neural network model (seq2seq) often used to translate or summarize text.

In addition to bias and data, I’ve collected more reasons to use counterfactuals in any language:

train chatbots on an equal selection of messages where the user or chatbot is addressed as male, female, or gender-neutral
change multi-dimensional properties (such as dialect or political views) to test content moderation or other complex models
disguise or ‘standardize’ communication in human-in-the-loop systems

A seq2seq model is built on two neural models: an encoder and a decoder. After looking at Fairseq and several seq2seq tutorials, I moved ahead with SimpleTransformers.

My original training data came from the MuchoCine dataset, using BETO -mBERT for encoding and decoding, but all inputs seemed to return unrelated movie review text.
For the next attempt, I created a large gender-flip dataset (about 7,000 lines) from the Spanish OSCAR corpus. Using BETO for both encoder and decoder, I then trained this for 100 epochs. The finished model works especially well, except it currently has a length limit in the output sequences.

You can see training and basic usage notebooks for the model uploaded to HuggingFace.

Gender factors into nouns, pronouns, adjectives, and verbs in Modern Standard Arabic. Nouns and adjectives are often coded as female by adding ﺔ (“tah”) to the end. There are plenty of words which are not so easy to modify, and then verbs whose rules I never learned in an intro class. Sometimes the beginning changes (يعمل / تعمل), sometimes the end (كتب / كتبت).

While researching this part, I noticed that Google Translate could use its own translation tests — here, ‘he runs’ and ‘she runs’ generate completely different sentences because he runs [a race] and she runs/manages [a business].

Google Translate testing

I updated a Python2-era dictionary app to recreate their SQL database of Arabic nouns and verbs. Unfortunately the ‘feminin’ attribute is missing from most words, and is rarely meaningful. This work is still ongoing.

Finding words which are meaningfully flipped in the arramooz dictionary

Prayer Book by ‘Abd al-Qadir Hisari, Metropolitan Museum of Art

We could also use a seq2seq approach to create Arabic counterfactuals in multiple dialects.

This was a difficult one — not because there is a lack of political Tweets in American English, but because it is generally harder to answer what a counterfactual ought to look like. Consider this text:

“Congressional Republicans are using fuzzy math to justify their scheme to drill for oil in the Arctic National Wildlife Refuge and destroy one of the last pristine landscapes on Earth,” Wyden said.

One obvious counterfactual would support drilling in the Arctic National Wildlife Refuge (ANWR). Should it be any political counterargument (our nation needs more domestically produced energy), or should it be as semantically similar as possible? Here’s my first pass at a human-written counterfactual:

“Congressional [Democrats] are using [__] to justify [_blocking_] drilling in the Arctic National Wildlife Refuge and destroy [_our economy_],” [alt name] said.

To generate this counterfactual, I flipped Republican/Democrat names, identified the core topic (drilling in the ANWR), and replaced chunks of text between the verbs.
Another option might be to preserve the text ‘Congressional Republicans’ and ANWR, but try to ‘flip’ the sentiment, like this:

“Congressional Republicans are using [_the economy_] to justify their [_plan_] to drill for oil in the Arctic National Wildlife Refuge and [__] one of the last [__] on Earth,” [alt name] said.

Both counterfactuals bring up the risk of ‘translating’ quotes — we should have a plan to avoid outputs which attribute generated quotes to someone.

I’d like to write a formal research paper around this seq2seq approach, and whether it is measurably useful in data augmentation and bias detection. I don’t know yet if political, dialect, or other counterfactuals would be in that project. There is still a lot of work to do. Anyway, do get in touch.

Footer