

Module 1 : Into to the Problem
In Module 1, we are going to get familier with pandas, the python module which we use to process and analyze data. Processing could include removing unknown values from the data or replacing unknown values with the values which make sense, maybe 0. Analyzing the data could include finding out the trend of the stock price, e.g. how the stock prices changes with Nifty 50 Basket of stocks.
Problem Statements :
- Import the csv file of the Hindustan Unilever stock. Shares of a company can be offered in more than one category. The category of the stock is indicated in ‘Series’ column. If the csv file has data on more than one category, the ‘Date’ column will have repeating values. To avoid repititions in the date, remove all the rows where ‘Series’ column is NOT “EQ”. Analuyze and understand each column properly. One can use head(), tail() for this.
2. Calculate the maximum, minimum and the mean price for the last 90 days. (price=closing price until stated otherwise)
3. Analyze the data types for each column in the dataframe. Pandas knows how to deal with dates in an intellegent manner. But to make use of Pandas functionality for dates, you need to ensure that the column is of type ‘datetime64(ns)’. Change the date column from ‘object type to ‘datetime64(ns)’ for future convinience. See what happens if we subtract minimum value of the date column from maximum value.
4. In a seperate array calculate the monthwise VMAP (Volume weighted Average Price).
5. Write a function to calculate the average price over the last N days of the stock price data where N is a user defined parameter. Write a second function to calculate the profit/loss percentage over the last N days. Calculate the average price AND the profit/loss percentages over the course of last — 1 week, 2 weeks, 1 month, 3 months, 6 months and 1 year. {Note : Profit/Loss percentage between N days is the percentage change between the closing prices of the 2 days }
6. Add a column ‘Day_Perc_Change’ where the values are the daily change in percentages i.e. the percentage change between 2 consecutive day’s closing prices. Instead of using the basic mathematical formula for computing the same, use ‘pct_change()’ function provided by Pandas for dataframes. You will note that the first entry of the column will have a ‘Nan’ value. Why does this happen? Either remove the first row, or set the entry to 0 before proceeding.
7. Add another column ‘Trend’ whose values are:
- ‘Slight or No change’ for ‘Day_Perc_Change’ in between -0.005 and 0.005
- ‘Slight positive’ for ‘Day_Perc_Change’ in between 0.005 and 0.01
- ‘Slight negative’ for ‘Day_Perc_Change’ in between -0.005 and -0.01
- ‘Positive’ for ‘Day_Perc_Change’ in between 0.01 and 0.03
- ‘Negative’ for ‘Day_Perc_Change’ in between -0.01 and -0.03
- ‘Among top gainers’ for ‘Day_Perc_Change’ in between 0.03 and 0.07
- ‘Among top losers’ for ‘Day_Perc_Change’ in between -0.03 and -0.07
- ‘Bull run’ for ‘Day_Perc_Change’ >0.07
- ‘Bear drop’ for ‘Day_Perc_Change’ <-0.07
8. Find the average and median values of the column ‘Total Traded Quantity’ for each of the types of ‘Trend’.
9.SAVE the dataframe with the additional columns computed as a csv file week2.csv. In Module 2, you are going to get familiar with matplotlib, the python module which is used to visualize data.