This is my Linear Regression project; data analysis and prediction of car prices based on the data. I used Pandas, NumPy, Stats Models, Seaborn, and Scikit learn libraries to analyse the data and build the model.
I imported the dataset as a ‘data’ variable and analysed the data by using – ‘head’ and ‘transpose’ functions.
After analysing the data, I realized that there are many useless columns for our model, so I removed them by using a ‘drop’ function.
Now comes the cleaning of data. There are many ‘?’ data in our data frame, so I replace them with NaN values.
Thereafter I created a new column called ‘cylinders’, to store all the string values of ‘num_of_cylinders’ column as numbers, and converted all the string values of the remaining columns into float values by using an ‘astype’ function.
I replaced the NaN values of every column with their respective column medians and visualized the data with a pair plot, by using a ‘pair plot’ function from Seaborn library.
I divided the data into an independent variable (x) and target variable (y). As my target is to predict the price, I took price as ‘y’ and remaining data as ‘x’.
I further divided ‘x’ and ‘y’ values into the train and test values with the help of ‘train_test_split’ module from Scikit learn, and took 25% of data as the test set, and remaining 75% as the training set.
I find the best fit line for the model by using a ‘fit’ function on a training set of ‘x’ and ‘y’ from Linear Regression (from Scikit learn).
I print all the coefficients (m) and intercept (c) of the model by using ‘coef_’ and ‘intercept_’ functions respectively.
You can check the score of the model by using ‘score’ function from Linear Regression.
I use Stats Model library to further analyse and improve the model. In this library, the target variable (y) and independent variable (x) should be in the same data frame, so I concatenate (x, y) into one data frame.
I find the best fit line by using ‘ols’ function to assign a formula and using ‘fit’ function on it.
We can get coefficients (m) and intercept (c) of this best fit line by using a ‘params’ function.
And to get the detailed information we can use a ‘summary’ function.
I did all this to practice Linear Regression, and the following are all the things I learned while doing this project:
Learned how to decide the useless columns in order to remove them.
Learned how to analyse data from a pair plot.
Learned how to use Stats Models library to analyse and improve the model.
Project Github: Car Price Prediction