Song lyrics generation with Artificial Intelligence (RNN)

Note: If you know enough about Machine Learning and Recurrent Neural Networks (and you are just interested in the code), please skip this part.

Let’s try to give a brief (non technical) overview of the theory that is behind this project.

Saying Artificial Intelligence is saying so many things, and it collapses into saying almost nothing. With this term we mean generally the entire set of techniques that are used to build some form of ability that we can in some way define “intelligent” (it is unclear, right?). Let’s try to be more precise.

In this particular application of this vague term that is Artificial Intelligence, I’ve used Recurrent Neural Networks. What are these beasts?

Let’s take a step-back. Neural Networks are Machine Learning algorithms that learn stuff in a layered way, in a similar way of the human learning (from simple concepts to more complex ones).

Recurrent Neural Networks is a specific kind of Neural Networks that process the data that are meant to be look in their entire sequence. Let’s say you want to predict the highest temperature you will get tomorrow. A way to do that is to use the Recurrent Neural Networks. This example may seems trivial, but it is actually the same thing we want to do here:

From a given sequence as an input, predict the next word, then the next word, then the next word…

Let’s play. 🙂

GIF from The Kennedy Center

The dataset I’ve used is a courtesy of Manva Pradhan and you can find it here

Yes, I’ve picked Taylor Swift choruses to train my data. And it is not (just) because she melts my hearts every time she sings.

GIF from Giphy

These methods work extremely well if you use a lot of data to train your model (you may have encountered the term “Big Data”). The drawback is that you require a lot of computational power to have a decent result out of a lot of data. So I’ve used a single singer and base my model on that.

But why did I use Taylor Swift?

I’ve done that because she is actually “easy to get” when it is about choruses. She doesn’t use solemn terms and she doesn’t use over-sophisticated metrics. And that’s about it. You could use Ed Sheeran, or Justin Bieber, or someone else (the best thing would be actually to use them all together to create a powerful model).

Let’s give a look at the dataset:

So you have the entire lyrics with this line_number for each songs. But we want to write the new choruses, so we’re eventually have to take the choruses only (we will do that, keep calm). In all the datasets I’ve worked, I’ve always found something strange that messes your model up. Unfortunately, this is not an exception.

The same album appears multiple times, but with different names, and it is actually a problem. Fixing this with this few lines:

Ok, we’re cool. Now, if you look at the starting point of each verse, chorus, or bridge you could find this notation : [Verse], [Chorus], [Bridge] (actually you find it every where, it is like super-basic).

So let’s have another column that select the lines that contains that ‘[‘ stuff (1/0).

Awesome, now we just have to pick the Choruses. We move in these ‘[‘ values that are specifically the ones of the Choruses (remember that you have stored those in that IND list), and we stop when we find another ‘]’.

Again, let’s clean some mess here:

Here:

And here:

With this line:

And the die is cast.

This is an example:

GIF from Ash vs Evil Dead

These models are complex to build, and unless you are a researcher, you’ll never build a Neural Network from scratch. Here’ s the Recurrent Neural Network I’ve used .

The first thing you do is not immediate to comprehend, as it is pretty technical. It regards a series of techniques that are used in order to make strings “readable” as numbers.

It is not so interesting to deepen it here, but here’s the TensorFlow commented code:

Then you have the interesting part. Words are seen as vectors that needs to be computed in the best way as possible to capture the meaning of the word itself (this method is called embedding). Then, you use the Gated Recurrent Units, that are cells that are able to “remember” a certain number of previous words in a clever way. Finally, you use a dense layer with the logit that gives you an information about the most probable word you expect. Isn’t that awesome?

Graph developed by Tensorflow

Of course, these methods are “magical but not magic”. So they need to be trained, for a pretty long period of time. Specifically, they are trained to minimize a certain loss you have to attach to your optimizer:

Trained model right here:

And this is the last step:

So the input is:

The trained model
The start string (remember: the model is “recurrent”)
The temperature.

This last input is actually amazing. In fact if you use low temperature, you will get predictable results, if you increase the temperature, your lyrics will become more “creative”. You don’t believe me, right? You will. 🙂

You would probably be thinking: “Hey man, this is enough. Give me your lyrics”.

You’re right bad boy/ girl. Here’s three example, with different values of temperature and different inputs:

As I’ve told you, if you increase the temperature you risk to have nonsense lyrics like “Say a mind of my friends are saying”. On the other hand, low temperature takes you to existing lyrics, so you have to be careful and adapt the temperature and the start string.

If you want to be more technical, you could use LSTM cell instead of GRU, or use a more powerful machine, or change the data pre processing part.

We are skeptical about “AI writing songs”, and there is a reason why we are. We like to think that Music, Art, Poetry, Cinema doesn’t regards numbers, equation, computers, but belongs to a different part of ourselves, that is the creative and passionate one.

As a musician and data scientist, I’m really confused. I would like to think that when I listen to my favourite album and I get goosebumps it is because there is something more about the music that is not just a good mix of sounds and words that are accurately predicted by a logit function. But isn’t it Artificial Intelligence a form of art by itself? Does this “art” actually exist? Does these feelings actually exist? Well, I do have feelings for Taylor Swift though.

Ok, I’ve got way too far. As always, please hit me at piero.paialunga@hotmail.com if you have anything you would like to share with me about this project (literally, anything).

Thank you 🙂

Footer