“Understanding is a two-way street.”-Eleanor Roosevelt.
In one of my last blogs, I tried to explain text generation through a simple Maximum Likelihood model.
In this blog, I will try to explain how can we do the same through the Bidirectional LSTM model.
In one of my blogs of RNN we talked about all types of RNNs but they had a shortcoming dependency on context only from past.
Bidirectional LSTMs can be used to train two sides, instead of one side of input sequence. First from left to right on the input sequence and the second in reversed order of the input sequence. It provides one more context to the word to fit in right context from words coming after and before , this results in faster and fully learning and solving a problem.
Now let’s see how to implement this model in text generation.
Import the following libraries:
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import regularizers
import tensorflow.keras.utils as ku
import numpy as np
Text pre-processing
In this, the whole text is cleaned and converted to lower case, and the whole corpus of sentences are joined. Words are then tokenized, and the total number of words are determined. To learn more about tokenization you can refer to my previous blog.
I am taking few lines of the speech of Donald Trump here. Whole speech I didn’t take as to train this knd of model a lot of training time is needed.
tokenizer = Tokenizer()
data = open('../input/dtspeech/DTSpeech.txt').read()
corpus = data.lower().split("n")
tokenizer.fit_on_texts(corpus)
total_words = len(tokenizer.word_index) + 1
Creating Sequences
For each word n-gram sequence is made and input sequences are updated. It happens in iteration for next word and so on.
For example in sentence above first ‘He was extracted out then , ‘He was ’ was extracted and then ‘He was walking ’ is extracted and so on.
# create input sequences using list of tokens
input_sequences = []
for line in corpus:
token_list = tokenizer.texts_to_sequences([line])[0]
for i in range(1, len(token_list)):
n_gram_sequence = token_list[:i+1]
input_sequences.append(n_gram_sequence)
Padding sequences
Maximun length of sentence is extracted and then rest of the sentences are pre-padded.
# pad sequences
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))
Extract last word of sequence and convert it to categorical from numerical.
# create predictors and label
predictors, label = input_sequences[:,:-1],input_sequences[:,-1]label = ku.to_categorical(label, num_classes=total_words)
Let’s make a sequential model now with first layer as word embedding layer.
And then applying bidirectional LSTM, where return_sequence is marked as True so that the word generation keeps in consideration, previous and even the words coming ahead in sequence.
A dropout layer to avoid overfitting, one more LSTM layer, one more dense layer with activation as Relu, and regularizer to avoid over-fitting again.
Output layer has softmax so as to get probability of word to be predicted next.
model = Sequential()
model.add(Embedding(total_words, 100, input_length=max_sequence_len-1))
model.add(Bidirectional(LSTM(150, return_sequences = True)))
model.add(Dropout(0.2))
model.add(LSTM(100))
model.add(Dense(total_words/2, activation='relu', kernel_regularizer=regularizers.l2(0.01)))
model.add(Dense(total_words, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
Fit the model with 100 epochs to train.
history = model.fit(predictors, label, epochs=100, verbose=1)
Plot the accuracy and loss.
import matplotlib.pyplot as plt
acc = history.history['accuracy']
loss = history.history['loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'b', label='Training accuracy')
plt.title('Training accuracy')
plt.figure()
plt.plot(epochs, loss, 'b', label='Training Loss')
plt.title('Training loss')
plt.legend()
plt.show()
Now let’s give seed for next text generation. 100 next words are generated this way.
seed_text = " We will make America safe again, and we will make America great again."
next_words = 100
Seed will be taken as first and tokenized and padded on token list. Model is then used to predict with token list as input.
Then most probable word is added to seed text and this happens for next 100 words.
for _ in range(next_words):
token_list = tokenizer.texts_to_sequences([seed_text])[0]
token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
predicted = model.predict_classes(token_list, verbose=0)
output_word = ""
for word, index in tokenizer.word_index.items():
if index == predicted:
output_word = word
break
seed_text += " " + output_word
print(seed_text)
Output
We will make America safe again, and we will make America great again. big hello wisconsin they have and the worst they be here a lot some when and we're was this is the violent important reelection that it strongly this is the violent important election in the family ever wisconsin and they have to have in the violent big hello wisconsin the worst they should has is the year wisconsin to have to have your country you the violent left wing mob you biden very family had i nafta them and that's in a want we want to surrender when what the important reelection and a have to have your country to
The output is not perfect as for training we took only few lines of text. Hence we can very well fine tune it.
This is bidirectional LSTM through which we tried to generate text, we can improve the model with more epochs , more text or GRUs, and even by adding attention layers.
Thanks for reading!