Time Series Analysis is a widely used method in business in order to get useful pieces of information such as demand forecasting, identify seasonal products, demand pattern categorization and other characteristics. Here we are going to focus on ** Time Series forecasting** (

*using Statistical / Machine Learning / Deep Learning model to predict future values)*&

**(**

*demand pattern categorization**categorizing products based on quantity and time*).

In this blog, I am going to explain how we can fit **multiple (1000+)** time series models using Statistical (Classical Models), Machine Learning & Deep Learning models, time-series feature engineering & demand pattern categorization. This series will have the following **5** parts:

Part 1:Data Cleaning & Demand categorization.

Part 2:Fit(ARIMA, ETS, CROSTON etc.) using fpp3 (tidy forecasting) R Package.statistical Time Series models

Part 3:Time Seriesusing timetk R Package.Feature Engineering

Part 4:Fit(XGBoost, Random Forest, etc.)Machine Learning models&using modeltime & tidymodels R packages.Hyperparameter tuning

Part 5:Fit(NBeats & DeepAR) & Hyperparameter tuning using modeltime, modeltime.gluonts R packages.Deeplearning models

Let’s get started!

*PS: This is not the ONLY method to tackle this problem. However, this is one way to tackle this problem.*

## The Data

The data I’m using is from the *Food Demand Forecasting hackathon** in AnalyticsVidhya. *The goal of this hackathon is to forecast the number of orders for each meal/centre combos to a food delivery service. We have a total of ** 3,548** meal/centre combos (i.e. 77 centres & 51 meals), meaning that

**will have to be fitted. This technique in a business environment is also known as**

*3,548-time series models***Forecasting.**

*Scalable*Let’s import libraries.

pacman::p_load(tidyverse, magrittr) # data wrangling packagespacman::p_load(lubridate, tsintermittent, fpp3, modeltime, timetk, modeltime.gluonts, tidymodels, modeltime.ensemble, modeltime.resample) # time series model packagespacman::p_load(foreach, future) # parallel functionspacman::p_load(viridis, plotly) # visualizations packagestheme_set(hrbrthemes::theme_ipsum()) # set default themes

Now read the train data to fit time series models and submission data to predict future values.

meal_demand_tbl <- read.csv(unz("data/raw/train_GzS76OK.zip", "train.csv")) # reading train datanew_tbl <- read.csv("data/raw/test_QoiMO9B.csv") # the data need to forecastskimr::skim(meal_demand_tbl %>%

# remove id

select(-id) %>%

# make center & meal id factors

mutate(center_id = factor(center_id),

meal_id = factor(meal_id))) # summary of data

## The Data Preprocessing

In this stage, data preprocessing steps were performed. This data was then transformed to time-series data (i.e. to *tsibble* object: this is a special type of data that handles time series models in *fpp3* package).

The above summary shows that there are 51 types of meals sold in 77 centres, which makes a total of 3,548-time series data, with each time series data consisting of 145 weeks. Here we will need to forecast the number of orders ( `num_orders`

) for each meal/centre combo. Furthermore, by looking at the column `complete_rate`

, we can see that there are no missing values in variables.

The column `week`

is in numbers from 1–145, so we will need to change this to dates. We will also remove combos (meal/centre) that did not require forecasting.

date_list <- tibble(id = seq(1, 155, 1),

week_date = seq(from = as.Date("2016-01-02"), by = "week", length.out = 155))master_data_tbl <- meal_demand_tbl %>%

left_join(date_list, by = c("week" = "id")) %>% # joining the date

inner_join(distinct(new_tbl, meal_id, center_id), by = c("meal_id", "center_id")) %>% # remove combos that did not want to forecast

select(week_date, num_orders, everything(), -c(week, id))

Now let’s transform the train and submission data into complete data i.e. make irregular time series data to regular time series data by inserting new `date`

rows. These newly created `date`

rows make missing values for `num_orders`

& other variables. Hence, **zero** was imputed for the variable`num_orders `

by assuming that no sales occurred on these specific weeks and for the other variables, we replaced them with their corresponding previous week values.

For example, the following time series data (Table 1)shows that after the 4th week there is data missing up to 7th week. Table 2 shows the completed data with the new entries for those missing weeks (i.e. weeks 5 & 6).

Then `emailer_for_promotion`

& `homepage_featured`

variables are transformed into a factor.

`master_data_tbl <- master_data_tbl %>%`

as_tsibble(key = c(meal_id, center_id), index = week_date) %>%

## num_urders missing value imputation ----

fill_gaps(num_orders = 0, .full = end()) %>% # make it complete by max week dates

## X variables missing value imputation ----

group_by_key() %>%

fill_(fill_cols = c("emailer_for_promotion", "homepage_featured", "base_price", "checkout_price")) %>% # filling other variables

ungroup() %>%

## change variables to factor ----

mutate(emailer_for_promotion = factor(emailer_for_promotion),

homepage_featured = factor(homepage_featured))

A similar operation is carried out with`submission`

file.

`## New Table (Submission file) data wrangling ----`

new_tbl <- new_tbl %>%

left_join(date_list, by = c("week" = "id")) %>% # joining the date

full_join(new_data(master_data_tbl, n = 10), by = c("center_id", "meal_id", "week_date")) %>%

as_tsibble(key = c(meal_id, center_id), index = week_date) %>%

group_by_key() %>%

fill_(fill_cols = c("emailer_for_promotion", "homepage_featured", "base_price", "checkout_price")) %>% # filling other variables

ungroup() %>%

# change variables to factor

mutate(emailer_for_promotion = factor(emailer_for_promotion),

homepage_featured = factor(homepage_featured))

## The Time Series Food data Visualizing

**Plot 1: Number of orders by Centres**

`master_data_tbl %>%`

# Randomly Pick 4 Centres

distinct(center_id) %>%

sample_n(4) %>% # Joining the transaction data

left_join(master_data_tbl) %>%

group_by(week_date, center_id) %>% # aggregate to centres

summarise(num_orders = sum(num_orders, na.rm = T)) %>%

as_tsibble(key = center_id, index = week_date) %>%

fill_gaps(num_orders = 0, .full = end()) %>%

autoplot(num_orders) +

scale_color_viridis(discrete = T)