A gentle introduction to price elasticity of demand(PED) in Python
Where were you in 2016–2017? I was in NYC — the city that starts the trends from fashion to food and from entertainment to music. I joined the wagon and participated in a few but one that has stuck with me is avocado in the form of avocado toast or guacamole.
The year 2017 saw an upsurge in the prices of the fruit with avocado scene booming exorbitantly; I have paid as high as $3.65 for a single Hass avocado at local Whole Foods. There is a joke that all millennials are poor and live with their parents because they spend all the money on avocado toasts. For some reason, the fruit is considered premium and is marked at a higher price; well, one of the reason is demand. There could be other macroeconomic factors such as seasonality, competition, weather etc but more or less, as we have gauged from the good ol’ economic theory, the price can be driven by the demand to a great extent.
It simple terms, if something is less expensive then more people will buy it and if something is expensive then fewer people will buy it. This exact behaviour is quantified with the help of a concept called PED(Price Elasticity of Demand).
I wanted to analyse the concept with the following questions:
1. Is there a relation between demand and supply of avocados over the time?
2. What is the price elasticity of demand for avocado?
3. Is the demand dependent only on the price or there are other factors too?
1. Price Elasticity of Demand
PED measures the percentage change in demand Q when, holding everything else constant, the price is changed by one percent.
PED =(∂Q/∂P ) * P/Q
∂Q is the change in the demand and ∂P is the change in the price.
The term elasticity is synonymous with sensitivity.
In economic theory, elasticity is a measure of how sensitive demand or supply is to price.
In marketing, it is how sensitive consumers are to a change in the price of a product.
2. Why PED?
Why would I want to measure PED for my product/good/service?
- It can help me make better pricing decision i.e. what could be the optimal price that I can set for my product
- If I decrease the price then what will be the impact on the demand?
- Will the revenue fall or rise on increasing or decreasing the prices?
PED can provide a starting point to many other questions as well.
With this rudimentary understanding, let’s move ahead and let our green fingers work.
3. Data
Avocado prices data is available on Kaggle and captures the average price of the fruit along with the quantity sold from 2015 to 2018.
There are three columns which Avocado aficionados will be aware of:
4046 — Small/Medium Hass Avocado (~3–5oz avocado)
4225 — Large Hass Avocado (~8–10oz avocado)
4770 — Extra Large Hass Avocado (~10–15oz avocado)
Rest of them are self-explanatory.
Let’s load the data.
sf = tc.SFrame.read_csv("avocado.csv")
sf.print_rows(3)
Oh BTW…I am using apple’s turicreate framework (https://github.com/apple/turicreate) which you can install easily using:
pip install -U turicreate
If you pledge allegiance to Pandas dataframe then you can use that as well or if somewhere during the execution you don’t like turicreate then you can switch the dataframe using
df = sf.to_dataframe()
There are a few ugly column names, let’s fix them first
sf = sf.rename({'Total Volume': 'Volume'})
sf = sf.rename({'Total Bags': 'Bags'})
sf = sf.rename({'4225': 'twent_fv_Av'})
sf = sf.rename({'4046': 'for_si_Av'})
sf = sf.rename({'4770': 'sev_sev_Av'})sf.print_rows(3)
4. Simple Features
I tried to plot the data but 18,249 points didn’t make much sense. So, let’s roll the data up for the time being and let’s see what happens.
Let’s concoct a ‘quarter’ variable from the data column.
qtr = []
for item in sf['Date']:
date_i = dt.datetime.strptime(item, '%Y-%m-%d')
qtr.append((date_i.month + 2) // 3)
sf['qtr'] = qtr
An interim SFrame that will have the data rolled up in quarters and year:
sf_g = sf.groupby(['year', 'qtr'], tc.aggregate.MEAN(
'Volume'), tc.aggregate.MEAN('AveragePrice'))
sf_g = sf_g.sort(['year', 'qtr'])# Let's treat the ugly names of the columns as well
sf_g = sf_g.rename({'Avg of Volume': 'Volume', 'Avg of AveragePrice': 'Price'})
Let’s plot the curve
tc.visualization.set_target(target='browser')
tc.show(sf_g['Price'], sf_g['Volume'], xlabel="Price",
ylabel="Demand", title="Demand-Supply Curve")
Not a conspicuous downward slope as we see in textbooks but if we look carefully then as the price increases the number of avocados sold more or less decrease.
Maybe I need a better curve.
5. Demand-Supply
Let’s switch to pandas dataframe for a while and plot the time-series of price and volume.
df_g = sf_g.to_dataframe()def plt_x(): fig, (a1, ax2) = plt.subplots(nrows=2, sharex=True, subplot_kw=dict(frameon=False), figsize=(15, 8)) plt.subplots_adjust(hspace=.0)
ax1.grid()
ax2.grid() ax1.plot(df_g['Price'], color='g')
ax2.plot(df_g['Volume'], color='b')
ax1.set_ylabel('Price')
ax2.set_ylabel('Volume')
plt_x()
As we traverse from 2015 to 2018 with the 13 data points, the increasing peaks in price have corresponding valleys in volume and vice-versa.
This is a much better graph and confirms the economic theory of demand and supply.
We need something more robust to confirm our hypothesis, maybe a statistical approach.
6. OLS Modeling
Statsmodels provide an easy way into OLS and that’s what we will use here.
Null hypothesis : There is no relationship between Price and Volume.
One interesting thing about the dataset is Volume = ‘4046’ + ‘4225’ + ‘4770’ + Bags; So, we can have our pick of the variables in the model.
df = sf.to_dataframe() # SFrame to datafamemodel_1 = ols(
" AveragePrice ~ twent_fv_Av + Bags", data=df).fit()print(model_1.summary())
The p-values are less than α = 0.05(significance value), so the null hypothesis can be rejected.
The problem in the model above is R² which is quite poor. It is an indication that in the case of avocadoes, price isn’t only a function of demand but also other factors.
Let’s make a few changes.
We have ‘type’ variable which affects greatly the price of the fruit, organic being more expensive than the conventional one. We can include it in the model.
X = df[['twent_fv_Av', 'Bags','type', 'qtr']]
y = df['AveragePrice']#Encode the categorical variable.
X = pd.get_dummies(X, prefix=["type"], columns=["type"], drop_first = True)
Fit the model again.
mod = sm.OLS(y, X).fit()
mod.summary()
Interpreting the results graphically can be more intuitive.
fig = plt.figure(figsize=(8, 8))
fig = sm.graphics.plot_partregress_grid(mod, fig=fig)
There are a few trends as seen in the partial regression plots (effect of each variable on the response variable i.e. Average Price)
I won’t call them strong but I have seen worse.
1. R² of the model seems fine which means that 4 variables combined explain 88% variability in the values of Price. Not bad!
2. p-values of all except ‘Bags’ are less than α = 0.05, showing they exert their influence on price.
So, Price of avocado is dependent on Volume, type of avocado, and quarter in which it was sold.
7. PED and Products
We know the formula of PED but what does it represent in the real world?
Simple rule:
PED > 1: Elastic product
PED < 1: Inelastic product
Elastic products are those that are highly sensitive to price changes i.e. a small change in price can cause a major shift in demand. Luxury products such as cars, perfumes etc should be elastic because they are discretionary items i.e. they are ‘wants’ not needs.
Inelastic products are those that aren’t very sensitive to price changes i.e. even a large change in price won’t have a major impact on the demand. Items for life’s sustenance fall in the cohort. If you are ill, then you will buy the medicine whether it is for $5 or $10.
I believe avocado should be an elastic product and its PED should be >1. Let’s find out.
Given the fact that the larger dataset is much noisier, let’s use the one that’s rolled up to quarters.
def plt_reg():
fig, scatter = plt.subplots(figsize=(15, 7))
sns.set_theme(color_codes=True)
sns.regplot(x=df_g['Price'], y=df_g['Volume'])plt_reg()
Hmm…we know that PED =(∂Q/∂P ) * P/Q
∂Q/∂P = (0.7–1.1)/(1.2–1.7) = 0.8
P/Q = 1.7/0.7 = 2.42
PED = 1.94
As expected, PED for avocado > 1 and making it an elastic commodity.
Source code can be found on my Github.
The answer to the three questions to which I was seeking answers:
1. Is there a relation between demand and supply of avocados over the time? Yes, the same has been established graphically and statistically. The statistical model has the R² of 0.88 which means the variables included in the model explain a lot of variance in price.
2. What is the price elasticity of demand for avocado? PED came out to be 1.94 which was expected because avocado should fall in the cohort of elastic products.
3. Is the demand dependent only on the price or there are other factors too? Yes, Type of avocado, quarter of sale date are few prominent factors on which the quantity sold is dependent.
I welcome feedback and constructive criticism. I can be reached on Twitter @prashantmdgl9