Sentiment Analysis with LSTM in PyTorch Framework

Photo by Andy Kelly on Unsplash

Before digging deep, let’s first understand the meaning of sentiment analysis in a machine learning context. It is also known as emotion AI or opinion mining.

In simple words, it is a section of Natural Language Processing in which we categorise a patch of text into various kinds of emotion (like anger, love, hate, etc.).

Image from wikipedia

RNN is short for Recurrent Neural Networks. It takes in account of the output obtained in the last iteration as an input alongside the current input. It is most efficient when working with sequence data. This technique is widely used for processing text data and speech data as a series. This is also used for generating sequential output.

Long short-term memory(LSTM) is an artificial neural network architecture. It has a similar working principle like the recurrent neural network(RNN).

Image from wikipedia

LSTM consists of different types of gates. The LSTM cell gates control the flow of information such as what to pass as input, what part to discard, and what to produce as output. LSTM avoids the problem of vanishing/exploding gradient that is frequently observed in standard RNNs.

There are three different type of gates.
-Input Gate
-Output Gate
-Forget Gate

Cell state works as a transporter of the relative information from one cell to another. This carries useful information. It helps the relevant information to be passed down from the ongoing step to the later step.

GRU is a simpler version of LSTM. In most cases, GRU performance is similar to that of LSTM. Training for GRU is faster. GRU cell contains two types of gates i.e., Reset Gate and Update Gate.

We can download the tokenised data set for a model directly from PyTorch. Else, if the data set is present in raw form, then we have to tokenise the data, steps for which are mentioned in the next section. You can easily find an example data set for sentiment analysis from Kaggle.

(x_train, y_train), (x_val, y_val) = keras.datasets.imdb.load_data(
num_words=max_features)

Tokenisation helps converting a sentence into a vector of numbers so that the model can manipulate the input and classify the data.

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequencesmax_words = 20000
max_len = 200tokenizer = Tokenizer(num_words=max_words, split=' ')
tokenizer.fit_on_texts(x1)
sequences = tokenizer.texts_to_sequences(x1)
x_train = pad_sequences(sequences, maxlen=max_len)

In the model, the first layer is embedding. Embedding is a built-in model that establishes the relationship between words. When the input vector is enormous, it reduces the dimension according to the model without losing any information. Here we are using two GRU layers with 128 units each. It is lightweight compared to LSTM. Also faster to train with almost equal efficiency, sometimes even better than LSTM. The last layer is dense. You can change the dense layer units according to the number of different sentiment classes in the data set.

embed_size = 128
num_oov_buckets = 1000
vocab_size = 10000# Input for variable-length sequences of integers
# Embed each integer in a 128-dimensional vector
model3 = keras.models.Sequential([
keras.layers.Embedding(max_features, embed_size,
mask_zero=True, # not shown in the book
input_shape=[None]),
keras.layers.GRU(128, return_sequences=True),
keras.layers.GRU(128),
keras.layers.Dense(6, activation="sigmoid")
])
model3.summary()

We are using Adam as an optimizer, and Binary cross-entropy as a loss function. Generally, the validation set is 20% of the data set. In this case we can quickly achieve an accuracy of 90% or above.

model2.compile("adam", "binary_crossentropy", metrics=["accuracy"])
model2.fit(X_train, Y_train, batch_size=32, epochs=3,  validation_data=(X_val, Y_val))

Sentiment Analysis helps to categorise the sentence into different classes based on the emotion it is conveying.
The initial step is to tokenise the sentence into a vector of numbers.
Embedding constructs the relationship between the words.
Using GRU in the model lets you train faster than LSTM
Divide the data set into 80% for training purpose and 20% for validation to attain an accurate result.

Footer