Marcos Ycaro Barros Trindade has successfully completed this project.

Intro to Pandas for Data Analysis

medium

4.71

Practice DataFrame Mutations using Airbnb Data

Finished

July 9, 2024 6:28 PM

Elapsed time (min)

Completed activities

Resolution

Activities

Look at the dataset¶

In [1]:

# importing pandas library
import pandas as pd

In [2]:

df = pd.read_csv('AB_US_2023.csv', low_memory=False, parse_dates=['last_review'])
df.head()

Out[2]:

	id	name	host_id	host_name	neighbourhood_group	neighbourhood	latitude	longitude	room_type	price	minimum_nights	number_of_reviews	last_review	reviews_per_month	calculated_host_listings_count	availability_365	number_of_reviews_ltm	city
0	958	Bright, Modern Garden Unit - 1BR/1BTH	1169	Holly	NaN	Western Addition	37.77028	-122.43317	Entire home/apt	202	2	383	2023-02-19	2.31	1	128	59	San Francisco
1	5858	Creative Sanctuary	8904	Philip And Tania	NaN	Bernal Heights	37.74474	-122.42089	Entire home/apt	235	30	111	2017-08-06	0.66	1	365	0	San Francisco
2	8142	Friendly Room Apt. Style -UCSF/USF - San Franc...	21994	Aaron	NaN	Haight Ashbury	37.76555	-122.45213	Private room	56	32	9	2022-10-27	0.09	13	365	1	San Francisco
3	8339	Historic Alamo Square Victorian	24215	Rosy	NaN	Western Addition	37.77564	-122.43642	Entire home/apt	575	9	28	2019-06-28	0.17	2	365	0	San Francisco
4	8739	Mission Sunshine, with Private Bath	7149	Ivan & Wendy	NaN	Mission	37.76030	-122.42197	Private room	110	1	770	2023-02-25	4.65	2	159	34	San Francisco

In [3]:

df.columns

Out[3]:

Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'price',
       'minimum_nights', 'number_of_reviews', 'last_review',
       'reviews_per_month', 'calculated_host_listings_count',
       'availability_365', 'number_of_reviews_ltm', 'city'],
      dtype='object')

In [4]:

df.shape

Out[4]:

(232147, 18)

In [5]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 232147 entries, 0 to 232146
Data columns (total 18 columns):
 #   Column                          Non-Null Count   Dtype         
---  ------                          --------------   -----         
 0   id                              232147 non-null  int64         
 1   name                            232131 non-null  object        
 2   host_id                         232147 non-null  int64         
 3   host_name                       232134 non-null  object        
 4   neighbourhood_group             96500 non-null   object        
 5   neighbourhood                   232147 non-null  object        
 6   latitude                        232147 non-null  float64       
 7   longitude                       232147 non-null  float64       
 8   room_type                       232147 non-null  object        
 9   price                           232147 non-null  int64         
 10  minimum_nights                  232147 non-null  int64         
 11  number_of_reviews               232147 non-null  int64         
 12  last_review                     183062 non-null  datetime64[ns]
 13  reviews_per_month               183062 non-null  float64       
 14  calculated_host_listings_count  232147 non-null  int64         
 15  availability_365                232147 non-null  int64         
 16  number_of_reviews_ltm           232147 non-null  int64         
 17  city                            232147 non-null  object        
dtypes: datetime64[ns](1), float64(3), int64(8), object(6)
memory usage: 31.9+ MB

In [6]:

df.describe()

Out[6]:

	id	host_id	latitude	longitude	price	minimum_nights	number_of_reviews	last_review	reviews_per_month	calculated_host_listings_count	availability_365	number_of_reviews_ltm
count	2.321470e+05	2.321470e+05	232147.000000	232147.000000	232147.000000	232147.000000	232147.000000	183062	183062.000000	232147.000000	232147.000000	232147.000000
mean	2.584580e+17	1.582248e+08	36.610585	-98.301436	259.468001	13.495867	40.915230	2022-06-01 20:16:18.553714176	1.638348	29.879055	180.985686	11.689701
min	6.000000e+00	2.300000e+01	25.957323	-123.089130	0.000000	1.000000	0.000000	2010-10-04 00:00:00	0.010000	1.000000	0.000000	0.000000
25%	2.638896e+07	2.299242e+07	33.976225	-118.315111	91.000000	2.000000	1.000000	2022-08-14 00:00:00	0.310000	1.000000	52.000000	0.000000
50%	4.896307e+07	1.005783e+08	36.190556	-97.727670	149.000000	3.000000	9.000000	2023-01-20 00:00:00	1.000000	2.000000	175.000000	3.000000
75%	6.633014e+17	2.686930e+08	40.717440	-77.026222	250.000000	30.000000	43.000000	2023-03-04 00:00:00	2.420000	10.000000	321.000000	16.000000
max	8.581014e+17	5.069384e+08	47.734010	-70.996000	100000.000000	1250.000000	3091.000000	2023-03-31 00:00:00	101.420000	1003.000000	365.000000	1314.000000
std	3.465985e+17	1.587164e+08	5.126523	19.706929	1024.645918	27.920631	80.649152	NaN	1.910812	106.013665	134.715299	20.599954

In [7]:

df.isnull().sum()

Out[7]:

id                                     0
name                                  16
host_id                                0
host_name                             13
neighbourhood_group               135647
neighbourhood                          0
latitude                               0
longitude                              0
room_type                              0
price                                  0
minimum_nights                         0
number_of_reviews                      0
last_review                        49085
reviews_per_month                  49085
calculated_host_listings_count         0
availability_365                       0
number_of_reviews_ltm                  0
city                                   0
dtype: int64

Activities¶

1. Create a New Column `price_per_minimum_stay`¶

In [8]:

df['price_per_minimum_stay'] = df["minimum_nights"] * df["price"]

2. Delete all rows where the price is greater than $500¶

In [9]:

df.drop(df.loc[df['price'] > 500].index,inplace = True)

3. Delete the `host_name` and `neighbourhood_group` columns from the DataFrame `df`¶

In [10]:

# try your code here
df.drop(["host_name","neighbourhood_group"],axis=1,inplace=True)

4. Rename the column `number_of_reviews` to `reviews_count`¶

In [11]:

# try your code here
df.rename({"number_of_reviews":"reviews_count"},axis=1,inplace=True)

5. Convert the `price` column from integer to float data type¶

In [12]:

# try your code here
df['price'] = df['price'].astype('float64')

6. Replace all occurrences of `Private room` in the `room_type` column with `Private`¶

In [13]:

# try your code here
df['room_type'].replace("Private room","Private",inplace=True)

/tmp/ipykernel_2387/2373905210.py:2: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['room_type'].replace("Private room","Private",inplace=True)

7. Add new row with the given details¶

In [14]:

new_row_data = {'id': 851792795339743534, 'name': 'Tony Stark Apartment', 'host_id': 67890, 'room_type': 'Entire home/apt',
                'price': 150, 'minimum_nights': 3, 'reviews_count': 10}

In [15]:

# try your code here
new_row = pd.DataFrame(new_row_data,index=[len(df)])
df = pd.concat([df,new_row])

8. Remove the `availability_365` column from the DataFrame without creating a new DataFrame¶

In [16]:

# try your code here
del df['availability_365']

9. Sort the DataFrame by the `price` column in descending order¶

In [17]:

# try your code here
sorted_df = df.sort_values(by=['price'],ascending=False)

10. Convert all prices from US dollars to euros¶

In [18]:

# try your code here
df['price_eur'] = df['price'] * 0.85

11. Modify the `price_per_minimum_stay` by doubling the rates.¶

In [22]:

# try your code here
df['price_per_minimum_stay'] = df['price_per_minimum_stay'] * 2

12. Create a new column named `year` that contains the year information from the `last_review` column¶

In [40]:

# try your code here
df['year'] = df['last_review'].dt.year.astype('int32',errors='ignore')

Statement of Completion#e3a29f5b

Intro to Pandas for Data Analysis

Practice DataFrame Mutations using Airbnb Data

Look at the dataset¶

Activities¶

1. Create a New Column `price_per_minimum_stay`¶

2. Delete all rows where the price is greater than $500¶

3. Delete the `host_name` and `neighbourhood_group` columns from the DataFrame `df`¶

4. Rename the column `number_of_reviews` to `reviews_count`¶

5. Convert the `price` column from integer to float data type¶

6. Replace all occurrences of `Private room` in the `room_type` column with `Private`¶

7. Add new row with the given details¶

8. Remove the `availability_365` column from the DataFrame without creating a new DataFrame¶

9. Sort the DataFrame by the `price` column in descending order¶

10. Convert all prices from US dollars to euros¶

11. Modify the `price_per_minimum_stay` by doubling the rates.¶

12. Create a new column named `year` that contains the year information from the `last_review` column¶

The End!¶

Statement of Completion#e3a29f5b

Intro to Pandas for Data Analysis

Practice DataFrame Mutations using Airbnb Data

Look at the dataset¶

Activities¶

1. Create a New Column price_per_minimum_stay¶

2. Delete all rows where the price is greater than $500¶

3. Delete the host_name and neighbourhood_group columns from the DataFrame df¶

4. Rename the column number_of_reviews to reviews_count¶

5. Convert the price column from integer to float data type¶

6. Replace all occurrences of Private room in the room_type column with Private¶

7. Add new row with the given details¶

8. Remove the availability_365 column from the DataFrame without creating a new DataFrame¶

9. Sort the DataFrame by the price column in descending order¶

10. Convert all prices from US dollars to euros¶

11. Modify the price_per_minimum_stay by doubling the rates.¶

12. Create a new column named year that contains the year information from the last_review column¶

The End!¶

1. Create a New Column `price_per_minimum_stay`¶

3. Delete the `host_name` and `neighbourhood_group` columns from the DataFrame `df`¶

4. Rename the column `number_of_reviews` to `reviews_count`¶

5. Convert the `price` column from integer to float data type¶

6. Replace all occurrences of `Private room` in the `room_type` column with `Private`¶

8. Remove the `availability_365` column from the DataFrame without creating a new DataFrame¶

9. Sort the DataFrame by the `price` column in descending order¶

11. Modify the `price_per_minimum_stay` by doubling the rates.¶

12. Create a new column named `year` that contains the year information from the `last_review` column¶