Demand forecasting is a process of research of the value of the demand for future periods. It is therefore a question of estimating the consumption of products or services for future periods by adopting either a qualitative approach (which is based on judgments) or a quantitative approach (which is based on historical data).
In this case study, we are interested in forecasting the demand of a self-service bicycle company using transaction history. To do so, we will use a traditional forecasting model (double exponential smoothing) and three models based on the most commonly used machine learning algorithms: LSTM, Regression trees, and Support vector regression. Then we will compare the prediction performance of these models.
The complete python code used to analyze the data and implement the models is available vias this link.
A self-service bicycle system (VLS) makes bicycles available to the public, free of charge or not. This mobility service allows people to travel locally, mainly in urban areas. This bike rental is a form of collaborative consumption and thus removes three obstacles to cycling: parking at home, theft, and maintenance of one’s personal bike.
The data used for this study was provided by Transport of London (TfL),
the local public body responsible for public transport in the City of London
and Greater London, UK, who published them on Kaggle. You can retrieve the data via the following link: https://www.kaggle.com/hmavrodiev/london-bike-sharing-dataset.
Copyright Notice: Powered by TfL Open Data, Contains OS data © Crown copyright and database rights 2016’ and Geomni UK Map data © and database rights [2019].
Description of the data structure
The database used for the study contains 17,414 transaction records (i.e. the number of bike rentals) and is spread over a period of 2 years (from January 04, 2015, to January 3, 2017). The records are made hourly; each record contains several pieces of information (9) including :
timestamp: date field
demand: the number of bike rentals in this period (1 hour)
t1: actual temperature in C
t2: the temperature in C “looks like”.
hum: humidity in percentage
wind_speed: wind speed in km / h
weather_code: weather category
is_holiday: boolean field — 1 holiday / 0 holiday
is_weekend: boolean field — 1 if the day is the weekend
season: 0-spring; 1-summer; 2-fall; 3-winter.
Prior to using the data in all of the selected models, a data pre-processing step was necessary to prepare the data in the most appropriate format for these models and to understand the data. For example, to be able to view the average demand by the hour, day, week, and month, the fields hour, day_of_week, day_of_month, and month are added to each registration. Other changes that are made to this data vary depending on the need for the implementation of the model.
Interpretation of the distribution of the application
The table below (Figure 10), presents descriptive data on the distribution of all fields in the database. It shows for each field the number of entries, the mean value, the standard deviation, the minimum, and maximum value, the 25th, 50th, 75th percentile; but we are only interested in the distribution of demand.
There are a total of 17,414 entries in field demand. The number of bike rental requests varies between 0 (minimum value) and 7860 (maximum value) with a mean of 1143 and a standard deviation (the dispersion of values around the mean) of 1085. This standard deviation shows us how variable the demand is. The number of order that divides the statistical series in two is 844, i.e. one-half of the registered applications are greater than or equal to 844, and in the other half, all applications are less than or equal to 844.
Understanding the data
Before any forecast model implementation, it is really convenient to visualize the demand data to understand it. The figure below shows the evolution of demand overtime during the two years.
We can make an aggregation to observe the same evolution per month. The graph is given below. It can be noticed that the demand is clearly greater in summer (July — October) than in other periods. It is minimal in winter (January — April). In addition, it is also very remarkable that the series has a seasonality component.
The figure below shows the first 1000 hours of the time series in blue to allow a clearer view of the evolution of demand and the moving average in orange. Although the latter is a model, we use it to observe the evolution of the average over time.
It can be noted that the average varies around a fixed value. We can therefore conclude that demand is therefore stationary and does not contain a trend component.
Demand is higher on weekdays than on weekends (Next figure, where 0 represents Monday, 1 represents Tuesday, and so on).
We can also visualize the demand by day to see when it is high and when it is low (Figure 14). The curve in orange represents holidays and the curve in blue represents other days. Overall, demand is lower on vacation days. From 7:00 a.m. to 9:00 a.m. and from 4:00 p.m. to 7:00 p.m., there is also more demand than at other times of the day.
The next figure compares the average demand per hour for the 4 seasons. The orange curve represents summer, the green curve represents fall, the blue curve represents spring and the red curve represents winter. As noted above, bicycle rentals are more in demand in the summer; and even based on this comparison, it is clear that the cold weather reduces demand. This is quite understandable since it is difficult to ride a bike in cold weather.
The implementation process is the same for all 4 models:
- Data pre-processing: The first step in the implementation of any predictive model is to cleanse and correctly format the data according to the needs of the model to be implemented.
- The creation of the training and test data: In this step, we divide our database into two parts: the largest part will be used to train the model and the other part will be used to do the prediction test.
In our case, we take 12414 hours as the training set and 5000 hours as the test set. - Model creation: The creation of the forecast model trained on the training set.
- Model evaluation: Evaluating the performance of the forecasting model by calculating indicators. We will use here the mean
square error (MSE) and the mean error (ME).
Double exponential smoothing
Through the data mining done above, it appears that the demand is
stationary and that the time series presents only the level components of a stationary and seasonality. This observation leads us to choose among time series models the double exponential smoothing since it captures these two components.
To carry out the forecast by this model, a part of the database was used to implement the model and the other to test it.
The mean square of the errors on the test data is 2,541,646 and the mean of the deviations is -1037. The graphs below present the performance of the model over the first 120 hours of test data (for a visualization of the models lighter) where the blue curve represents the test data and the orange curve represents the test data. forecast.
The CME is really big, which means that the model has not really succeeded in the forecast. This can be seen on the graph.
Support vector regression
Vector Machine Support (SVM) is well known in the problems of
classification. Their applications for regression problems are called the
Support Vector Regression.
In most models, the goal is to reduce prediction error. In most models, the goal is to reduce prediction error. But when you don’t care about the size of the error as long as it is within a defined range, SVR becomes very useful. SVR gives the flexibility to define the acceptable margin of error in the model and find a line to adjust the data.
To build our model, we will choose the demand as the dependent variable (or
variable to be predicted) and as independent variables all other information of the transaction for one hour i.e. t1, t2, hum, wind_speed, weather_code, is_holiday, is_weekend, and season.
After performing the prediction on the test data, the mean square error is 1.135.314 and the mean deviation is -338. The graphs below show the performance of the models over the first 120 hours of the test data (for a clearer visualization) where the blue curve represents the test data and the orange one the forecast. This model performed poorly on the prediction but did better than the double exponential smoothing.
Regression trees
The regression tree is a machine learning algorithm that belongs to the class
of decision tree algorithms whose specificity is the prediction of a number (in
opposition to the classification tree that predicts a category or label).
In order to make a prediction, the tree will start at its foundation with a first
question yes/no, and depending on the answer, he will continue to ask new yes/no questions until he arrives at a final prediction. The model will ask
several questions in a row until he gets the correct answer.
In order to make a prediction, the question we will ask the machine algorithm
learning is as follows: based on the last 50 hours of application, what will be the demand within the next hour?
We will train the model by providing the data with a specific layout:
– 50 consecutive hours of input demand,
– the request for the next hour in output.
The algorithm will learn the relationship between the last 50 hours of demand and the next hour of output demand for the next hour.
After performing the forecast on the test data, the mean square of the errors
is 175,341 and the average deviation is -24. It can be noted that the model was able to capture the seasonality component. The forecast is relatively good.
Long short-term memory
As its name suggests, long short-term memory is a type of RNN that tries to reduce the memory of recent states for fear that the input training sequences will be forgotten due to the overwriting effect of recent states, as in classical RNN.
In fact, Recurrent Neural Networks (RNN) can predict the following value(s)
in a data sequence. A sequence is stored in matrix form, where each line is a vector of characteristics that describes it. The order of the lines in the matrix is important. RNNs contain loops. Each unit has a state and receives two inputs: the states of the previous layer and the statistics of this layer of the previous time step. This structure creates an effect of memorizing predictions for use in predicting the next time step.
To use this algorithm, we will train the model:
– as an input with a sequence of daily transactions (a series of 24 hours, all containing the information for that time except the request); in In other words, each element of the sequence will be a matrix of size 24 x 13 where 24 means the number of hours in the day and 13 means all the other information of the hour,
– and at the output, we will enter a sequence of the daily demand (for 24 hours); in other words, each element of this sequence will be a vector size 24.
The result of this algorithm is the creation of a model that captures the relationship that exists between all training sequences and the evolution of demand.
In fact, this model is a kind of mixture of the causal method or and time series, since it assumes that the variables environmental issues have an impact on demand and that there is a dependency between the demand for a period and the period following it.
After performing the forecast on the test data, the mean square of the errors is 91.083 and the mean error is -148. This model performed very well because it captures seasonality and the overall change in rental demand over time.
The IEM is an indicator for measuring the quality of forecasts. The lower it is, the fewer errors there are in the forecast and the MOE indicates how biased the model is (model tendency to under- or over-predict).
Overall, the machine learning models performed better than the double exponential smoothing model. The long short-term memory model is the best and greatly outperforms the double exponential smoothing model; however, it is more biased than the regression trees model. The ranking of the forecasting models from the most performant to the less one is as follows:
- LSTM
- Regression Trees
- Support Vector Regression
- Double exponential smoothing