
We young data scientists are often in need of new projects to help our professional portfolios. One of the easiest ways to find out what to make a project on is to scour the websites that house datasets. Although considered an easy way to find data, Kaggle is a great place to do that. Kaggle regularly has ongoing competitions whose data you can surely use. But there is much more as well. Kaggle houses thousands upon thousands of datasets that are not necessarily uploaded for data science competitions.
When you do find the dataset of your dreams, you need to download it and then move your downloaded folder into your project folder. This is not hard by any means but it can definitely get a little annoying if you don’t keep your data science projects folder readily available on your desktop for example.
A workaround that I’ve only recently thought of is seeing if Kaggle had an API to directly download datasets to your projects. A quick search and… of course Kaggle has an API, it’s a website made for data scientists. Another search and I was onto the official API GitHub which guides you through connection via the command line. This wasn’t the most straightforward for me and so I found an incredible tutorial on YouTube from user Decision Forest which is much more simple and involves installing the Kaggle API directly from your Jupyter notebook.
I’m going to walk you through installing the API the way that Decision Forest does, from a Jupyter notebook.
Open whichever folder you want to house your new project in, using Jupyter.
Type the following command into a code cell install Kaggle:
!pip install Kaggle
Then you want to make a Kaggle folder, which is normally hidden (.kaggle), preferably in your User folder, so it is not buried.
!mkdir ~/.kaggle
Next, you want to log into the Kaggle website to fetch an API token. Click towards your profile in the top right-hand corner, and select “Account”.
Scroll down through your account settings to a box noted “API”, shown below.