Building simulations are not only really cool, but quite relevant with the pandemic! Not only are Python simulations very beneficial to your coding fluency and your understanding of data science, but they are also fun and addictive to play around with.
There are a myriad of scenarios and factors you can simulate, often with less than a couple hundred lines of code. For example, I have an article about simulating a basic pandemic and predicting population control, both of which have code which you’re free to see!
Difficulty: Anywhere from trivial to super complex!
Where to start:
Skills you’ll learn:
- Object-oriented programming
- Simulating randomness in Python
- Modelling real-life scenarios
Banner vector created by katemangostar — www.freepik.com
Although it is interesting to simulate the spread of disease or social dynamics, we can find uses of data science and programming in business, too.
Forecasting sales for holidays, like Christmas, is incredibly important for determining how much to produce. Too much and there’s stale inventory. Too little and you’ve lost out on potential revenue.
Below are several resources for you to learn and practice retail sales forecasting.
Difficulty: Intermediate
Where to start:
Skills you’ll learn:
- Predictive modelling, possibly time-series forecasting as well
- Understanding business statistics
Photo by Pascal Meier on Unsplash
In a similar vein, let’s bring together science and business to improve our data science skills with a crucial, real life scenario. In the past nine months, Covid-19 has hugely changed the way we live our lives — particularly it has had a massive impact on worldwide travel. With the dataset below, explore the data, create visualizations, and even see if you can create a prediction model for airport traffic.
Difficulty: Easy
Where to start:
Skills you’ll learn:
- Exploratory Data Analysis
- Data Visualizations
If you already use Tweetdeck, this project is for you! Tweetdeck is a tool for Twitter that allows you to track your Twitter engagement and a variety of insights in real time. Using the Twitter API and a visualization tool like Dash or Streamlit, you can create a simple web application to create your own analytics platform for Twitter!
Difficulty: Intermediate
Where to start:
- Get familiar with Tweetdeck
- Learn how to engage with APIs and request an API key from Twitter
- Learn about a visualizing tool to deploy your visualizations, like Dash or Streamlit
Skills you’ll learn:
- Working with APIs
- Creating interactive insights and analytics dashboards
Photo by Joe Yates on Unsplash
Arguably one of the most practical data science concepts in the workplace is A/B Testing. And yet, it is a concept that is quite misunderstood because there are a lot of intricacies to it.
More specifically, determining click-through rates is an extremely metric for any company with a marketing team. By properly measuring click-through rates, you can optimize the appearance, the messaging, and anything else related to your online advertisements.
Difficulty: Intermediate
Where to start:
Skills you’ll learn:
- Exploratory Data Analysis
- How to conduct a proper A/B test for click through rates
Photo by Thibault Penin on Unsplash
The recommendation algorithms used by modern social media platforms and content aggregators are extremely complex and constantly developing. What’s a better way to understand how they work and improve themselves by building one yourself?
Difficulty: Intermediate-Advanced
Where to start:
Skills you’ll learn:
- Building Recommendation Systems
- SVD, matrix factorization
Image by mcmurryjulie — Pixabay
Learning how to webscrape data is simple to learn and extremely useful! Scraping a customer review website, like Trustpilot, is valuable for a company as it allows them to understand review trends (getting better or worse) and see what customers are saying via NLP.
Difficulty: Easy
Where to start:
Skills you’ll learn:
- Webscraping data
- Analyzing customer reviews
- Take it further and apply NLP to extract insights from reviews.
Image by Harish Sharma from Pixabay
What do you know, we’ve come full circle back to our challenge on retail analytics! But in this problem, however, the goal is to use statistics to cluster customers into similar groups so that you can identified desired customer segments that you want to market your business to!
Difficulty: Intermediate-Advanced
Where to start:
Skills you’ll learn:
- Clustering techniques
- Dimensionality reduction
Photo by Matthew Henry on Unsplash
This dataset is composed of power consumption data from PJM’s website. PJM is a regional transmission organization in the United States. Using this dataset, see if you can build a time series model to predict energy consumption. In addition to that, see if you can find trends around hours of the day, holiday energy usage, and long term trends!
Difficulty: Medium-Advanced
Where to start:
Photo by Ishant Mishra on Unsplash
What if you want to predict whether Tesla stocks will shoot to the mooooon. With time series forecasting, you can try to predict the trajectory of a stock. To make it easier, you can use Facebook’s time-series library called Prophet, which does a lot of the heavy lifting for you.
Difficulty: Intermediate
Where to start:
Skills you’ll learn:
- More time series knowledge
- Prophet — Facebook’s Time-Series package
Photo by NeONBRAND on Unsplash
Do you have some pictures you want to post to Instagram, but you are not sure which one will get you the most likes or comments? Well, data science can help you with that! You can create a predictive model based around various factors, such as the hashtags you use, the length of your post description, the number of pictures in a carousel, and throw it all together. From there you can test your ideas against this model, observe the outputs, and find the image format that is most likely to get you the most likes! This is a great project to work on if you are interested in machine learning, too.
Difficulty: Difficult!
Where to start:
- Don’t push yourself too far on your first version. Just take factors like brightness of image, length of post description, etc., which can be collected through web scraping or Instagram’s API.
- Format these values and use a machine learning or predictive model to map these to how many likes each post got
- From here, scale up by adding in hashtags, time of posting, etc and analysing thousands — or hundreds of thousands of posts — automatically to grow your data set.
- This is a difficult task which can be scaled up indefinitely so don’t be upset if you struggle on your first attempt. It’s why I put this one at the end of the list.
Skills you’ll learn:
- Collecting, cleaning, and manipulating data
- Predictive modelling using machine learning models
Photo by Markus Winkler on Unsplash
The last topic that I wanted to leave a little more open-ended is creating a resume-job description matcher. By using NLP techniques like latent semantic analysis, see if you can determine how close a resume matches a job description.
Where to start:
- Learn more about latent semantic analysis here
- Check out a similar idea related to resumes and job descriptions here.
Skills you’ll learn:
- NLP techniques like latent semantic analysis and/or cosine similarity
- Potentially linear algebra and SVD (singular value decomposition)