
We will present three binary text classification models using CNN, LSTM, and BERT.
Data Preprocess
Because we get our data from social network like Twitter or Facebook, there are a lot of useless or noisy data in the original dataset. Before feeding data into NLP model for training, we need to clean our text data at first. I list some steps we followed below, you can modify any rules here for cleaning aspect.
- drop null rule
- retweet rule
- hashtag rule
- markup rule
- url rule
- email rule
- number rule
- remove punctuation
- remove nonprintable
- remove ascii subset
- lemmatize rule
- tokenize rule
- vectorizative rule
- etc
Experiment CNN and LSTM
For the CNN model which is short name for Convolutional Neural Network, it normally is used for image part, but we just use it for our baseline here to see what is the worst case. We can find the model definition below.
For LSTM model which is Long Short-Term Memory, it has everything from Recurrent Neural Network (RNN) plus feedback connections. For NLP, it makes more sense when using LSTM or RNN, because some later words will impact the earlier words in one sentence. We can find the LSTM model definition below.
Experiment using BERT
Transfer learning is a very powerful part in the machine learning area.
It focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. (Reference 2)
BERT is for Pre-training of Deep Bidirectional Transformers for Language Understanding. We can just add a couple of layers based on the pre-train BERT model. For example, in our case, based on sequence_output layer, we add 4 more dense layers with dropout and regularizer. In the last layers, because we do the binary classification, the num_classes is 2. We can fill the whole BERT model below.
CNN vs LSTM vs BERT
Based on all three models, we calculate some performance metrics such as Precision, Recall, AUC and Accuracy. Also we trained our models using 15 epochs.
We can find that BERT has more than 167 times params than others, it takes more time to train and has more good performance result. The BERT we are using is bert_en_uncased_L-24_H-1024_A-16.