• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

12 Data Science Projects for 12 Days of Christmas

December 18, 2020 by systems

Building simulations are not only really cool, but quite relevant with the pandemic! Not only are Python simulations very beneficial to your coding fluency and your understanding of data science, but they are also fun and addictive to play around with.

There are a myriad of scenarios and factors you can simulate, often with less than a couple hundred lines of code. For example, I have an article about simulating a basic pandemic and predicting population control, both of which have code which you’re free to see!

Difficulty: Anywhere from trivial to super complex!

Where to start:

Skills you’ll learn:

  • Object-oriented programming
  • Simulating randomness in Python
  • Modelling real-life scenarios
Banner vector created by katemangostar — www.freepik.com

Although it is interesting to simulate the spread of disease or social dynamics, we can find uses of data science and programming in business, too.

Forecasting sales for holidays, like Christmas, is incredibly important for determining how much to produce. Too much and there’s stale inventory. Too little and you’ve lost out on potential revenue.

Below are several resources for you to learn and practice retail sales forecasting.

Difficulty: Intermediate

Where to start:

Skills you’ll learn:

  • Predictive modelling, possibly time-series forecasting as well
  • Understanding business statistics
Photo by Pascal Meier on Unsplash

In a similar vein, let’s bring together science and business to improve our data science skills with a crucial, real life scenario. In the past nine months, Covid-19 has hugely changed the way we live our lives — particularly it has had a massive impact on worldwide travel. With the dataset below, explore the data, create visualizations, and even see if you can create a prediction model for airport traffic.

Difficulty: Easy

Where to start:

  • Get this dataset here.
  • Learn how to create data visualizations with Plotly here

Skills you’ll learn:

  • Exploratory Data Analysis
  • Data Visualizations
Photo by MORAN on Unsplash

If you already use Tweetdeck, this project is for you! Tweetdeck is a tool for Twitter that allows you to track your Twitter engagement and a variety of insights in real time. Using the Twitter API and a visualization tool like Dash or Streamlit, you can create a simple web application to create your own analytics platform for Twitter!

Difficulty: Intermediate

Where to start:

  • Get familiar with Tweetdeck
  • Learn how to engage with APIs and request an API key from Twitter
  • Learn about a visualizing tool to deploy your visualizations, like Dash or Streamlit

Skills you’ll learn:

  • Working with APIs
  • Creating interactive insights and analytics dashboards
Photo by Joe Yates on Unsplash

Arguably one of the most practical data science concepts in the workplace is A/B Testing. And yet, it is a concept that is quite misunderstood because there are a lot of intricacies to it.

More specifically, determining click-through rates is an extremely metric for any company with a marketing team. By properly measuring click-through rates, you can optimize the appearance, the messaging, and anything else related to your online advertisements.

Difficulty: Intermediate

Where to start:

  • Sample dataset here.
  • Follow my step-by-step walkthrough here.

Skills you’ll learn:

  • Exploratory Data Analysis
  • How to conduct a proper A/B test for click through rates
Photo by Thibault Penin on Unsplash

The recommendation algorithms used by modern social media platforms and content aggregators are extremely complex and constantly developing. What’s a better way to understand how they work and improve themselves by building one yourself?

Difficulty: Intermediate-Advanced

Where to start:

Skills you’ll learn:

  • Building Recommendation Systems
  • SVD, matrix factorization
Image by mcmurryjulie — Pixabay

Learning how to webscrape data is simple to learn and extremely useful! Scraping a customer review website, like Trustpilot, is valuable for a company as it allows them to understand review trends (getting better or worse) and see what customers are saying via NLP.

Difficulty: Easy

Where to start:

Skills you’ll learn:

  • Webscraping data
  • Analyzing customer reviews
  • Take it further and apply NLP to extract insights from reviews.
Image by Harish Sharma from Pixabay

What do you know, we’ve come full circle back to our challenge on retail analytics! But in this problem, however, the goal is to use statistics to cluster customers into similar groups so that you can identified desired customer segments that you want to market your business to!

Difficulty: Intermediate-Advanced

Where to start:

Skills you’ll learn:

  • Clustering techniques
  • Dimensionality reduction
Photo by Matthew Henry on Unsplash

This dataset is composed of power consumption data from PJM’s website. PJM is a regional transmission organization in the United States. Using this dataset, see if you can build a time series model to predict energy consumption. In addition to that, see if you can find trends around hours of the day, holiday energy usage, and long term trends!

Difficulty: Medium-Advanced

Where to start:

  • Link to dataset here.
  • YouTube tutorial on Times Series Analysis in Python here.
Photo by Ishant Mishra on Unsplash

What if you want to predict whether Tesla stocks will shoot to the mooooon. With time series forecasting, you can try to predict the trajectory of a stock. To make it easier, you can use Facebook’s time-series library called Prophet, which does a lot of the heavy lifting for you.

Difficulty: Intermediate

Where to start:

Skills you’ll learn:

  • More time series knowledge
  • Prophet — Facebook’s Time-Series package
Photo by NeONBRAND on Unsplash

Do you have some pictures you want to post to Instagram, but you are not sure which one will get you the most likes or comments? Well, data science can help you with that! You can create a predictive model based around various factors, such as the hashtags you use, the length of your post description, the number of pictures in a carousel, and throw it all together. From there you can test your ideas against this model, observe the outputs, and find the image format that is most likely to get you the most likes! This is a great project to work on if you are interested in machine learning, too.

Difficulty: Difficult!

Where to start:

  • Don’t push yourself too far on your first version. Just take factors like brightness of image, length of post description, etc., which can be collected through web scraping or Instagram’s API.
  • Format these values and use a machine learning or predictive model to map these to how many likes each post got
  • From here, scale up by adding in hashtags, time of posting, etc and analysing thousands — or hundreds of thousands of posts — automatically to grow your data set.
  • This is a difficult task which can be scaled up indefinitely so don’t be upset if you struggle on your first attempt. It’s why I put this one at the end of the list.

Skills you’ll learn:

  • Collecting, cleaning, and manipulating data
  • Predictive modelling using machine learning models
Photo by Markus Winkler on Unsplash

The last topic that I wanted to leave a little more open-ended is creating a resume-job description matcher. By using NLP techniques like latent semantic analysis, see if you can determine how close a resume matches a job description.

Where to start:

  • Learn more about latent semantic analysis here
  • Check out a similar idea related to resumes and job descriptions here.

Skills you’ll learn:

  • NLP techniques like latent semantic analysis and/or cosine similarity
  • Potentially linear algebra and SVD (singular value decomposition)

Filed Under: Machine Learning

Primary Sidebar

In the cloud — Sparsity on GPUs provides 5X speedup

Cogeco’s Epico IPTV service offers internet, TV and cloud PVR in one package

EPISODE #57: Can The World’s Largest ITSM Vendor Innovate Fast Enough To Maintain Its Meteoric…

Predicting Property in the Big Apple

Costco offering 65-inch TCL TV for $90 off

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2021 NEO Share

Terms and Conditions - Privacy Policy