Music Source Separation with Spleeter on Google Colab

Music recordings can come with a variety of instrumental tracks, such as lead vocals, piano, drums, bass, etc. Each track is called a stem and, for most people, isolating those stems comes quite naturally when they are listening to a song. For example, if you listen to Old Time Rock n Roll, you will hear the piano stem at the beginning, Bob Seger’s lead vocal stem, and the drum stem all come together and you can follow each stem throughout the song. However, it’s not possible to hear a singular stem without it being distorted by the many other stems. This is where Spleeter comes into play…

What is Spleeter?

Spleeter is a source separation library which the music-streaming company Deezer released in 2019. For those unfamiliar with Deezer, it is very similar to Spotify and mostly used in France. Spleeter is the closest we can get to extracting individual tracks of a song and it’s mostly used by researchers working on Music Information Retrieval. Deezer uses Spleeter for their own research and they wanted to release something accessible for others to use in their own ways as well.

What can you do with Spleeter?

Besides research, you can do other things with Spleeter:

make acapellas/instrumentals (karaoke as well)
use it to extract acoustics/instrumentals to create your own version of a song you like (not copy, of course)
use it to learn how a song comes together from the individual stems (source separation makes the stems sound much less distorted)
play around mixing different tracks of artists you like; create mashups
if you play piano, drums, or bass instruments, you can extract those specific tracks to hear and understand them in clarity and play/create something similar

To follow along or execute this code yourself, you can find my Spleeter Google Colab in Github. I originally tried executing this on Jupyter Notebook but for reasons I am not sure of yet, it didn’t work but it worked fine on Google Colab. If you try this on a Python application, let me know if it works for you so I can figure out what I’m missing!

Quick Overview of my Google Colab Code

I installed an imported 3 important libraries: youtube-dl, Spleeter, and pydub.

pydub: module that works with audio files; specifically .wav files (Spleeter yields each stem as a .wav file)
youtube-dl: for downloading video and audio files from youtube
Spleeter: extracting the music stems

You also want to import “Audio” from the IPython.display object to create an Audio object of your downloaded song as well as the display function to display your audio.

import spleeter
import youtube_dl
import pydub
import IPython.display as ipd
from IPython.display import Audio, display
from IPython.display import HTML

Next, you have to create a variable to store the YouTube video’s arguments such as the “bestaudio” feature and the output name you want it to have. You can learn more about the different outputs you can get and documentation for the youtube-dl library here.

ydl_args = {
'format' = 'bestaudio/best'
'outtmpl' = 'filename.mp3'
}

Next, I used the instance of youtube_dl called YoutubeDL to access those arguments and download them.

ydl = youtube_dl.YoutubeDL(ydl_args)ydl.download(['url'])

This next step is where Spleeter does its job. For more information about its documentation, go here. Spleeter gives you 3 options depending on how many stems you want to extract: 2stems, 4 stems, and 5 stems.

2stems = vocals and accompaniment
4stems = vocals, drums, bass, and other
5stems = vocals, drums, bass, piano, and other

I used 4 stems for my project because I knew the song I chose mostly had vocals and drums.

!spleeter separate -p spleeter:4stems -o output/ entertainment.mp3

“-o” stands for the output file name and “-p” is for providing the model settings (4stems).

When you execute the Spleeter function, each stem will be written as a .wav file. Since I personally wanted to work with .mp3 files, I imported an instance of pydub called “AudioSegment” and its purpose is to read .wav audio files. After this, we can export that file as a .mp3 file.

from pydub import AudioSegment
sound = AudioSegment.from_wav("output/filename/vocals.wav")
sound.export("output/filename/vocals.mp3", format="mp3")

If you repeat the process above for each stem (vocals, bass, drum, etc.), the last thing left to do is use IPython.display (shortened to ipd) to display your final stems. It should look something like this:

Image by Author

That’s all there is! How you want to listen and play around with these files is up to you. I uploaded my stems to the app Audacity because it displays each stem separately and it allows you to turn on or turn off certain stems or play them all together. Here’s what it looks like if you want to use Audacity as well:

Image by Author

Conclusion

I hope this provided a clear image of what Spleeter does and how to use it. If you are going to publish anything (music/audio) that you used with Spleeter, make sure to do it with the permission of the owner! I enjoyed this project a lot, however, I noticed that Spleeter is not really talked about so part of my intent for this article is to show easily accessible this is to everyone and spread the word. This could also serve as perfect Data Science and introduction to Artificial Intelligence for beginners who love the intricacies of music.

What is Spleeter?

What can you do with Spleeter?

Quick Overview of my Google Colab Code

Conclusion

Footer