Learn how to implement a hate speech detection model with just a few lines of Python code
Hate speech can be found all around the web. Even an innocent website can be overwhelmed with hate-speech comments at any time due to the anonymous nature of the internet. But thanks to innovations in the field of NLP, it’s now possible to implement advanced hate speech detection models with just a few lines of code.
There are many models available on Hugging Face’s model distribution network that we can use. A quick search for models with the keyword “hate” brings up 17 models, and I’m sure more will be added in the near future. For this article, we’ll be focusing on a model produced by a team of researchers from the Indian Institute of Technology Kharagpur. The repository for the model can be found here, and its paper can be found here
For this tutorial, we’ll be using a library my team created called Happy Transformer. Happy Transformer is a wrapper built on top of Hugging Face’s Transformer library, allows programmers to implement and train Transformer models with just a few lines of code.
Install
pip install happytransformer
Import
from happytransformer import HappyTextClassification
Prediction Object Instantiation
HappyTextClassification objects contains two parameters. The first parameter, called model_type, indicates the kind of model. Currently it supports “ALBERT,” “BERT, “DISTILBERT,” and “ROBERTA.” However, all of the currently available hate speech models are either of type “BERT” or “ROBERTA”. The second parameter is called model_name, and as discussed, a list of models can be found here.
happy_tc = HappyTextClassification("BERT", "Hate-speech-CNERG/dehatebert-mono-english")
classify_text()
Each HappyTextClassification object contains a method called classify_text(). It simply requires a string, and returns a dataclass with variables “label” and “score.” The label variable is equal to “LABEL_0” when the input is not hate speech, and “LABEL_1” otherwise. The score variable indicated how confident the model is, with a score between 0–1 to represent the answer’s probability.
result = happy_tc.classify_text("Good job Bert!")
print(result)
Result:
TextClassificationResult(label=’LABEL_0′, score=0.9751753211021423)
Note: For obvious reasons, I’m not including actual hate speech as an example. Let’s pretend that this following input contains hate-speech.
result = happy_tc.classify_text("!&#*#&!*@&#*@*!*(#*")
print(result)
Result:
TextClassificationResult(label=’LABEL_1′, score=0.8652178049087524)
Extracting Results
result = happy_tc.classify_text("Let's use GPT-2 for the project")
print(result)
print(result.label)
print(result.score)
Result
TextClassificationResult(label=’LABEL_0′, score=0.9721677899360657)
LABEL_0
0.9721677899360657
Fine-tuning an original hate-speech detection model is incredibly easy with Hugging Face’s dataset distribution network and Happy Transformer. A quick search on Hugging Face’s dataset distribution network with the keyword “hate” results in 10 different datasets that you may use.
Below is a quick example on how to fine-tune a model to detect hate-speech using Happy Transformer. Comment down below if you are interested in me creating a full tutorial on this.
Happy Transformer
https://github.com/EricFillion/happy-transformer
Colab
Code from this tutorial:
YouTube:
Vennify AI’s YouTube channel. Subscribe for new videos about NLP.