What features do affect the price of Airbnb in Boston?

Airbnb is an American vacation rental online marketplace company based in San Francisco, California. Airbnb maintains and hosts a marketplace, accessible to consumers on its website or app. Users can arrange lodging, primarily homestays, and tourism experiences or list their spare rooms, properties, or part of it for rental. On the other hand, users who are traveling and looking for stays search properties and rooms by neighborhood or location. Airbnb recommends the best price in the neighborhood and users book the best deal.

Thanks to Kaggle and Udacity that I got a chance to analyze Airbnb listings of Boston city. Boston Airbnb listings dataset has various features such as neighborhood, property_type, bedrooms, bathrooms, beds, price, reviews, ratings, etc. It would be interesting to see what features are affecting the price in Boston city and draw interesting conclusions. I would be more interested in training and evaluating the model and to see how the model has performed while predicting the prices in Boston city at Airbnb.

To understand the dataset we have to explore it. Thanks to Python, Pandas, NumPy, Matplot, Seaborn, and Sklearn aka scikit learn it made my life easy to perform data science activities. Pandas is been excellent when it comes to load, clean and transform the data sets. Seaborn is a handy package to visualize data concluded from pandas transformation functions. It offers high-level functions to plot bar charts, histograms, distributions, box plots, etc. I will be using all these packages to explore the data. I would be performing the following data science activities to explore the data:

Import packages and read Boston Airbnb datasets
Data cleaning and transformation
Numerical features analysis
Categorical features analysis

Importing NumPy and pandas for linear algebra and data processing respectively. Importing matplotlib pyplot and seaborn for plotting dataset. Importing sklearn packages for training and evaluating a model.

After importing all the necessary packages let’s load the Boston Airbnb listings dataset into the memory. Pandas read_csv function made reading CSV files is way easy. It takes the file path including other optional parameters and returns a data frame object.

Exploring datasets is one of my favorite data science activities. It gives us lots of interesting and shocking facts about the features of the dataset. Moreover, it helps to identify the best features affecting the target variable. There are some cool functions such as a shape that returns the number of rows and columns of the dataset. Info function outputs a full list of columns, data type, and count of non-null values along with rows and columns. These functions help me understand the nature of features.

As part of this activity, I would like to initially clean the dataset followed by simple transformations and then perform Numerical and Categorical features analysis.

Observation:

We can see Boston Airbnb listings dataset has 3585 rows and 94 columns. There are too many columns. We need to know more about the type of columns and null value counts so that we can clean data next.

Int64Index: 3585 entries, 12147973 to 14504422
Data columns (total 94 columns):
#   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
0   listing_url                       3585 non-null   object 
1   scrape_id                         3585 non-null   int64  
2   last_scraped                      3585 non-null   object 
3   name                              3585 non-null   object 
4   summary                           3442 non-null   object 
5   space                             2528 non-null   object 
6   description                       3585 non-null   object 
7   experiences_offered               3585 non-null   object 
8   neighborhood_overview             2170 non-null   object 
9   notes                             1610 non-null   object 
10  transit                           2295 non-null   object 
11  access                            2096 non-null   object 
12  interaction                       2031 non-null   object 
13  house_rules                       2393 non-null   object 
14  thumbnail_url                     2986 non-null   object 
15  medium_url                        2986 non-null   object 
16  picture_url                       3585 non-null   object 
17  xl_picture_url                    2986 non-null   object 
18  host_id                           3585 non-null   int64  
19  host_url                          3585 non-null   object 
20  host_name                         3585 non-null   object 
21  host_since                        3585 non-null   object 
22  host_location                     3574 non-null   object 
23  host_about                        2276 non-null   object 
24  host_response_time                3114 non-null   object 
25  host_response_rate                3114 non-null   object 
26  host_acceptance_rate              3114 non-null   object 
27  host_is_superhost                 3585 non-null   object 
28  host_thumbnail_url                3585 non-null   object 
29  host_picture_url                  3585 non-null   object 
30  host_neighbourhood                3246 non-null   object 
31  host_listings_count               3585 non-null   int64  
32  host_total_listings_count         3585 non-null   int64  
33  host_verifications                3585 non-null   object 
34  host_has_profile_pic              3585 non-null   object 
35  host_identity_verified            3585 non-null   object 
36  street                            3585 non-null   object 
37  neighbourhood                     3042 non-null   object 
38  neighbourhood_cleansed            3585 non-null   object 
39  neighbourhood_group_cleansed      0 non-null      float64
40  city                              3583 non-null   object 
41  state                             3585 non-null   object 
42  zipcode                           3547 non-null   object 
43  market                            3571 non-null   object 
44  smart_location                    3585 non-null   object 
45  country_code                      3585 non-null   object 
46  country                           3585 non-null   object 
47  latitude                          3585 non-null   float64
48  longitude                         3585 non-null   float64
49  is_location_exact                 3585 non-null   object 
50  property_type                     3582 non-null   object 
51  room_type                         3585 non-null   object 
52  accommodates                      3585 non-null   int64  
53  bathrooms                         3571 non-null   float64
54  bedrooms                          3575 non-null   float64
55  beds                              3576 non-null   float64
56  bed_type                          3585 non-null   object 
57  amenities                         3585 non-null   object 
58  square_feet                       56 non-null     float64
59  price                             3585 non-null   object 
60  weekly_price                      892 non-null    object 
61  monthly_price                     888 non-null    object 
62  security_deposit                  1342 non-null   object 
63  cleaning_fee                      2478 non-null   object 
64  guests_included                   3585 non-null   int64  
65  extra_people                      3585 non-null   object 
66  minimum_nights                    3585 non-null   int64  
67  maximum_nights                    3585 non-null   int64  
68  calendar_updated                  3585 non-null   object 
69  has_availability                  0 non-null      float64
70  availability_30                   3585 non-null   int64  
71  availability_60                   3585 non-null   int64  
72  availability_90                   3585 non-null   int64  
73  availability_365                  3585 non-null   int64  
74  calendar_last_scraped             3585 non-null   object 
75  number_of_reviews                 3585 non-null   int64  
76  first_review                      2829 non-null   object 
77  last_review                       2829 non-null   object 
78  review_scores_rating              2772 non-null   float64
79  review_scores_accuracy            2762 non-null   float64
80  review_scores_cleanliness         2767 non-null   float64
81  review_scores_checkin             2765 non-null   float64
82  review_scores_communication       2767 non-null   float64
83  review_scores_location            2763 non-null   float64
84  review_scores_value               2764 non-null   float64
85  requires_license                  3585 non-null   object 
86  license                           0 non-null      float64
87  jurisdiction_names                0 non-null      float64
88  instant_bookable                  3585 non-null   object 
89  cancellation_policy               3585 non-null   object 
90  require_guest_profile_picture     3585 non-null   object 
91  require_guest_phone_verification  3585 non-null   object 
92  calculated_host_listings_count    3585 non-null   int64  
93  reviews_per_month                 2829 non-null   float64
dtypes: float64(18), int64(14), object(62)

Observations:

Some columns have very few non-null values. I am going to remove these columns from the data sets.
There are columns such as host_url, medium_url, pricture_url, etc that are not useful thus should be removed.
There are columns such as price, cleaning_fee, security_deposit, host_response_rate, etc that are of type object. These columns can be converted to number type.

Based on the above observations I am going to write a function that uses pandas high-level functions to drop columns that are not useful, drop columns having fewer values, fill NA values, and converting some object type columns to numeric columns. This activity will clean the data and will make more sense.

Footer