Statement of Completion#e3a29f5b
Intro to Pandas for Data Analysis
medium
Practice DataFrame Mutations using Airbnb Data
Resolution
Activities
Look at the dataset¶
In [1]:
# importing pandas library
import pandas as pd
In [2]:
df = pd.read_csv('AB_US_2023.csv', low_memory=False, parse_dates=['last_review'])
df.head()
Out[2]:
id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 | number_of_reviews_ltm | city | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 958 | Bright, Modern Garden Unit - 1BR/1BTH | 1169 | Holly | NaN | Western Addition | 37.77028 | -122.43317 | Entire home/apt | 202 | 2 | 383 | 2023-02-19 | 2.31 | 1 | 128 | 59 | San Francisco |
1 | 5858 | Creative Sanctuary | 8904 | Philip And Tania | NaN | Bernal Heights | 37.74474 | -122.42089 | Entire home/apt | 235 | 30 | 111 | 2017-08-06 | 0.66 | 1 | 365 | 0 | San Francisco |
2 | 8142 | Friendly Room Apt. Style -UCSF/USF - San Franc... | 21994 | Aaron | NaN | Haight Ashbury | 37.76555 | -122.45213 | Private room | 56 | 32 | 9 | 2022-10-27 | 0.09 | 13 | 365 | 1 | San Francisco |
3 | 8339 | Historic Alamo Square Victorian | 24215 | Rosy | NaN | Western Addition | 37.77564 | -122.43642 | Entire home/apt | 575 | 9 | 28 | 2019-06-28 | 0.17 | 2 | 365 | 0 | San Francisco |
4 | 8739 | Mission Sunshine, with Private Bath | 7149 | Ivan & Wendy | NaN | Mission | 37.76030 | -122.42197 | Private room | 110 | 1 | 770 | 2023-02-25 | 4.65 | 2 | 159 | 34 | San Francisco |
In [3]:
df.columns
Out[3]:
Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group', 'neighbourhood', 'latitude', 'longitude', 'room_type', 'price', 'minimum_nights', 'number_of_reviews', 'last_review', 'reviews_per_month', 'calculated_host_listings_count', 'availability_365', 'number_of_reviews_ltm', 'city'], dtype='object')
In [4]:
df.shape
Out[4]:
(232147, 18)
In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 232147 entries, 0 to 232146 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 id 232147 non-null int64 1 name 232131 non-null object 2 host_id 232147 non-null int64 3 host_name 232134 non-null object 4 neighbourhood_group 96500 non-null object 5 neighbourhood 232147 non-null object 6 latitude 232147 non-null float64 7 longitude 232147 non-null float64 8 room_type 232147 non-null object 9 price 232147 non-null int64 10 minimum_nights 232147 non-null int64 11 number_of_reviews 232147 non-null int64 12 last_review 183062 non-null datetime64[ns] 13 reviews_per_month 183062 non-null float64 14 calculated_host_listings_count 232147 non-null int64 15 availability_365 232147 non-null int64 16 number_of_reviews_ltm 232147 non-null int64 17 city 232147 non-null object dtypes: datetime64[ns](1), float64(3), int64(8), object(6) memory usage: 31.9+ MB
In [6]:
df.describe()
Out[6]:
id | host_id | latitude | longitude | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 | number_of_reviews_ltm | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 2.321470e+05 | 2.321470e+05 | 232147.000000 | 232147.000000 | 232147.000000 | 232147.000000 | 232147.000000 | 183062 | 183062.000000 | 232147.000000 | 232147.000000 | 232147.000000 |
mean | 2.584580e+17 | 1.582248e+08 | 36.610585 | -98.301436 | 259.468001 | 13.495867 | 40.915230 | 2022-06-01 20:16:18.553714176 | 1.638348 | 29.879055 | 180.985686 | 11.689701 |
min | 6.000000e+00 | 2.300000e+01 | 25.957323 | -123.089130 | 0.000000 | 1.000000 | 0.000000 | 2010-10-04 00:00:00 | 0.010000 | 1.000000 | 0.000000 | 0.000000 |
25% | 2.638896e+07 | 2.299242e+07 | 33.976225 | -118.315111 | 91.000000 | 2.000000 | 1.000000 | 2022-08-14 00:00:00 | 0.310000 | 1.000000 | 52.000000 | 0.000000 |
50% | 4.896307e+07 | 1.005783e+08 | 36.190556 | -97.727670 | 149.000000 | 3.000000 | 9.000000 | 2023-01-20 00:00:00 | 1.000000 | 2.000000 | 175.000000 | 3.000000 |
75% | 6.633014e+17 | 2.686930e+08 | 40.717440 | -77.026222 | 250.000000 | 30.000000 | 43.000000 | 2023-03-04 00:00:00 | 2.420000 | 10.000000 | 321.000000 | 16.000000 |
max | 8.581014e+17 | 5.069384e+08 | 47.734010 | -70.996000 | 100000.000000 | 1250.000000 | 3091.000000 | 2023-03-31 00:00:00 | 101.420000 | 1003.000000 | 365.000000 | 1314.000000 |
std | 3.465985e+17 | 1.587164e+08 | 5.126523 | 19.706929 | 1024.645918 | 27.920631 | 80.649152 | NaN | 1.910812 | 106.013665 | 134.715299 | 20.599954 |
In [7]:
df.isnull().sum()
Out[7]:
id 0 name 16 host_id 0 host_name 13 neighbourhood_group 135647 neighbourhood 0 latitude 0 longitude 0 room_type 0 price 0 minimum_nights 0 number_of_reviews 0 last_review 49085 reviews_per_month 49085 calculated_host_listings_count 0 availability_365 0 number_of_reviews_ltm 0 city 0 dtype: int64
Activities¶
1. Create a New Column price_per_minimum_stay
¶
In [8]:
df['price_per_minimum_stay'] = df["minimum_nights"] * df["price"]
2. Delete all rows where the price is greater than $500¶
In [9]:
df.drop(df.loc[df['price'] > 500].index,inplace = True)
3. Delete the host_name
and neighbourhood_group
columns from the DataFrame df
¶
In [10]:
# try your code here
df.drop(["host_name","neighbourhood_group"],axis=1,inplace=True)
4. Rename the column number_of_reviews
to reviews_count
¶
In [11]:
# try your code here
df.rename({"number_of_reviews":"reviews_count"},axis=1,inplace=True)
5. Convert the price
column from integer to float data type¶
In [12]:
# try your code here
df['price'] = df['price'].astype('float64')
6. Replace all occurrences of Private room
in the room_type
column with Private
¶
In [13]:
# try your code here
df['room_type'].replace("Private room","Private",inplace=True)
/tmp/ipykernel_2387/2373905210.py:2: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method. The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy. For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object. df['room_type'].replace("Private room","Private",inplace=True)
7. Add new row with the given details¶
In [14]:
new_row_data = {'id': 851792795339743534, 'name': 'Tony Stark Apartment', 'host_id': 67890, 'room_type': 'Entire home/apt',
'price': 150, 'minimum_nights': 3, 'reviews_count': 10}
In [15]:
# try your code here
new_row = pd.DataFrame(new_row_data,index=[len(df)])
df = pd.concat([df,new_row])
8. Remove the availability_365
column from the DataFrame without creating a new DataFrame¶
In [16]:
# try your code here
del df['availability_365']
9. Sort the DataFrame by the price
column in descending order¶
In [17]:
# try your code here
sorted_df = df.sort_values(by=['price'],ascending=False)
10. Convert all prices from US dollars to euros¶
In [18]:
# try your code here
df['price_eur'] = df['price'] * 0.85
11. Modify the price_per_minimum_stay
by doubling the rates.¶
In [22]:
# try your code here
df['price_per_minimum_stay'] = df['price_per_minimum_stay'] * 2
12. Create a new column named year
that contains the year information from the last_review
column¶
In [40]:
# try your code here
df['year'] = df['last_review'].dt.year.astype('int32',errors='ignore')