Statement of Completion#600cb069
Intro to Pandas for Data Analysis
medium
Practicing Series Vectorized Operations with Penguins Data
Resolution
Activities
Look at the dataset¶
In [1]:
import pandas as pd
In [2]:
# Read the dataset into a DataFrame
df = pd.read_csv('penguins_cleaned.csv')
df
Out[2]:
| species | island | culmen_length_mm | culmen_depth_mm | flipper_length_mm | body_mass_g | sex | |
|---|---|---|---|---|---|---|---|
| 0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | MALE |
| 1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | FEMALE |
| 2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | FEMALE |
| 3 | Adelie | Torgersen | 36.7 | 19.3 | 193.0 | 3450.0 | FEMALE |
| 4 | Adelie | Torgersen | 39.3 | 20.6 | 190.0 | 3650.0 | MALE |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 328 | Gentoo | Biscoe | 47.2 | 13.7 | 214.0 | 4925.0 | FEMALE |
| 329 | Gentoo | Biscoe | 46.8 | 14.3 | 215.0 | 4850.0 | FEMALE |
| 330 | Gentoo | Biscoe | 50.4 | 15.7 | 222.0 | 5750.0 | MALE |
| 331 | Gentoo | Biscoe | 45.2 | 14.8 | 212.0 | 5200.0 | FEMALE |
| 332 | Gentoo | Biscoe | 49.9 | 16.1 | 213.0 | 5400.0 | MALE |
333 rows × 7 columns
In [3]:
# Convert all columns to pandas Series
species = df['species']
island = df['island']
culmen_length_mm = df['culmen_length_mm']
culmen_depth_mm = df['culmen_depth_mm']
flipper_length_mm = df['flipper_length_mm']
body_mass_g = df['body_mass_g']
gender = df['sex']
In [4]:
print("Species: ", species)
Species: 0 Adelie
1 Adelie
2 Adelie
3 Adelie
4 Adelie
...
328 Gentoo
329 Gentoo
330 Gentoo
331 Gentoo
332 Gentoo
Name: species, Length: 333, dtype: object
In [5]:
print("Island: ", island)
Island: 0 Torgersen
1 Torgersen
2 Torgersen
3 Torgersen
4 Torgersen
...
328 Biscoe
329 Biscoe
330 Biscoe
331 Biscoe
332 Biscoe
Name: island, Length: 333, dtype: object
In [6]:
print("Culmen Length (mm): ", culmen_length_mm)
Culmen Length (mm): 0 39.1
1 39.5
2 40.3
3 36.7
4 39.3
...
328 47.2
329 46.8
330 50.4
331 45.2
332 49.9
Name: culmen_length_mm, Length: 333, dtype: float64
In [7]:
print("Culmen Depth (mm): ", culmen_depth_mm)
Culmen Depth (mm): 0 18.7
1 17.4
2 18.0
3 19.3
4 20.6
...
328 13.7
329 14.3
330 15.7
331 14.8
332 16.1
Name: culmen_depth_mm, Length: 333, dtype: float64
In [8]:
print("Flipper Length (mm): ", flipper_length_mm)
Flipper Length (mm): 0 181.0
1 186.0
2 195.0
3 193.0
4 190.0
...
328 214.0
329 215.0
330 222.0
331 212.0
332 213.0
Name: flipper_length_mm, Length: 333, dtype: float64
In [9]:
print("Body Mass (g): ", body_mass_g)
Body Mass (g): 0 3750.0
1 3800.0
2 3250.0
3 3450.0
4 3650.0
...
328 4925.0
329 4850.0
330 5750.0
331 5200.0
332 5400.0
Name: body_mass_g, Length: 333, dtype: float64
In [10]:
print("Gender: ", gender)
Gender: 0 MALE
1 FEMALE
2 FEMALE
3 FEMALE
4 MALE
...
328 FEMALE
329 FEMALE
330 MALE
331 FEMALE
332 MALE
Name: sex, Length: 333, dtype: object
Activities¶
1. Add a constant value.¶
In [12]:
body_mass_g_plus_100 = (body_mass_g + 100)
print(body_mass_g_plus_100)
0 3850.0
1 3900.0
2 3350.0
3 3550.0
4 3750.0
...
328 5025.0
329 4950.0
330 5850.0
331 5300.0
332 5500.0
Name: body_mass_g, Length: 333, dtype: float64
2. Subtract the 'culmen_length_mm' series from the 'flipper_length_mm' series¶
In [17]:
length_difference = (flipper_length_mm - culmen_length_mm)
print(length_difference)
0 141.9
1 146.5
2 154.7
3 156.3
4 150.7
...
328 166.8
329 168.2
330 171.6
331 166.8
332 163.1
Length: 333, dtype: float64
3. Multiply to series¶
In [19]:
double_culmen_depth_mm = (culmen_depth_mm*2)
print(double_culmen_depth_mm)
0 37.4
1 34.8
2 36.0
3 38.6
4 41.2
...
328 27.4
329 28.6
330 31.4
331 29.6
332 32.2
Name: culmen_depth_mm, Length: 333, dtype: float64
4. Raise the 'flipper_length_mm' series to the power¶
In [21]:
flipper_length_mm_squared = flipper_length_mm ** 2
print(flipper_length_mm_squared)
0 32761.0
1 34596.0
2 38025.0
3 37249.0
4 36100.0
...
328 45796.0
329 46225.0
330 49284.0
331 44944.0
332 45369.0
Name: flipper_length_mm, Length: 333, dtype: float64
5. Calculate the mean of the 'culmen_length_mm' series and subtract it from each value in the series¶
In [32]:
culmen_length_mm_mean_centered =(culmen_length_mm - culmen_length_mm.mean())
print(culmen_length_mm_mean_centered)
0 -4.892793
1 -4.492793
2 -3.692793
3 -7.292793
4 -4.692793
...
328 3.207207
329 2.807207
330 6.407207
331 1.207207
332 5.907207
Name: culmen_length_mm, Length: 333, dtype: float64
6. Concatenate the 'species' and 'gender' series¶
In [34]:
species_and_gender = species + '-' + gender
print(species_and_gender)
0 Adelie-MALE
1 Adelie-FEMALE
2 Adelie-FEMALE
3 Adelie-FEMALE
4 Adelie-MALE
...
328 Gentoo-FEMALE
329 Gentoo-FEMALE
330 Gentoo-MALE
331 Gentoo-FEMALE
332 Gentoo-MALE
Length: 333, dtype: object
7. Perform element-wise addition¶
In [37]:
culmen_length_plus_depth_mm = culmen_length_mm + culmen_depth_mm
print(culmen_length_plus_depth_mm)
0 57.8
1 56.9
2 58.3
3 56.0
4 59.9
...
328 60.9
329 61.1
330 66.1
331 60.0
332 66.0
Length: 333, dtype: float64
8. Sort culmen_length_mm in descending order¶
In [41]:
culmen_length_mm_sorted = culmen_length_mm.sort_values(ascending=False)
print(culmen_length_mm_sorted)
246 59.6
163 58.0
313 55.9
209 55.8
326 55.1
...
13 34.4
86 34.0
64 33.5
92 33.1
136 32.1
Name: culmen_length_mm, Length: 333, dtype: float64
9. Divide flipper_length_mm by culmen_length_mm¶
In [43]:
length_ratio = flipper_length_mm / culmen_length_mm
print(length_ratio)
0 4.629156
1 4.708861
2 4.838710
3 5.258856
4 4.834606
...
328 4.533898
329 4.594017
330 4.404762
331 4.690265
332 4.268537
Length: 333, dtype: float64