Statement of Completion#4261473c
Intro to Pandas for Data Analysis
easy
Practicing filtering sorting with Pokemon
Resolution
Activities
Project.ipynb
Task 0 - Setup¶
There isn't much to do here, we'll provide the required imports and the read the pokemon CSV we'll be working with.
In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [3]:
df = pd.read_csv("pokemon.csv")
In [4]:
df.head()
Out[4]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
4 | 5 | Charmeleon | Fire | NaN | 405 | 58 | 64 | 58 | 80 | 65 | 80 | 1 | False |
In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 721 entries, 0 to 720 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 # 721 non-null int64 1 Name 721 non-null object 2 Type 1 721 non-null object 3 Type 2 359 non-null object 4 Total 721 non-null int64 5 HP 721 non-null int64 6 Attack 721 non-null int64 7 Defense 721 non-null int64 8 Sp. Atk 721 non-null int64 9 Sp. Def 721 non-null int64 10 Speed 721 non-null int64 11 Generation 721 non-null int64 12 Legendary 721 non-null bool dtypes: bool(1), int64(9), object(3) memory usage: 68.4+ KB
In [6]:
df.describe()
Out[6]:
# | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | |
---|---|---|---|---|---|---|---|---|---|
count | 721.00000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 |
mean | 361.00000 | 417.945908 | 68.380028 | 75.124827 | 70.697642 | 68.848821 | 69.180305 | 65.714286 | 3.323162 |
std | 208.27906 | 109.663671 | 25.848272 | 29.070335 | 29.194941 | 28.898590 | 26.899364 | 27.277920 | 1.669873 |
min | 1.00000 | 180.000000 | 1.000000 | 5.000000 | 5.000000 | 10.000000 | 20.000000 | 5.000000 | 1.000000 |
25% | 181.00000 | 320.000000 | 50.000000 | 54.000000 | 50.000000 | 45.000000 | 50.000000 | 45.000000 | 2.000000 |
50% | 361.00000 | 424.000000 | 65.000000 | 75.000000 | 65.000000 | 65.000000 | 65.000000 | 65.000000 | 3.000000 |
75% | 541.00000 | 499.000000 | 80.000000 | 95.000000 | 85.000000 | 90.000000 | 85.000000 | 85.000000 | 5.000000 |
max | 721.00000 | 720.000000 | 255.000000 | 165.000000 | 230.000000 | 154.000000 | 230.000000 | 160.000000 | 6.000000 |
Distribution of Pokemon Types:¶
In [54]:
df['Type 1'].value_counts().plot(kind='pie', autopct='%1.1f%%', cmap='tab20c', figsize=(10, 8))
Out[54]:
<Axes: ylabel='count'>
Distribution of Pokemon Totals:¶
In [8]:
df['Total'].plot(kind='hist', figsize=(10, 8))
Out[8]:
<Axes: ylabel='Frequency'>
In [25]:
df['Total'].plot(kind='box', vert=False, figsize=(10, 5))
Out[25]:
<Axes: >
Distribution of Legendary Pokemons:¶
In [10]:
df['Legendary'].value_counts().plot(kind='pie', autopct='%1.1f%%', cmap='Set3', figsize=(10, 8))
Out[10]:
<Axes: ylabel='count'>
Basic filtering¶
Let's start with a few simple activities regarding filtering.
1. How many Pokemons exist with an Attack
value greater than 150?¶
Doing a little bit of visual exploration, we can have a sense of the most "powerful" pokemons (defined by their "Attack" feature). A boxplot is a great way to visualize this:
In [26]:
sns.boxplot(data=df, x='Attack')
Out[26]:
<Axes: xlabel='Attack'>
In [34]:
df[['Name', 'Attack']].sort_values(by='Attack', ascending= False)
Out[34]:
Name | Attack | |
---|---|---|
408 | Rampardos | 165 |
485 | Regigigas | 160 |
288 | Slaking | 160 |
382 | Groudon | 150 |
385 | DeoxysNormal Forme | 150 |
... | ... | ... |
128 | Magikarp | 10 |
212 | Shuckle | 10 |
241 | Blissey | 10 |
439 | Happiny | 5 |
112 | Chansey | 5 |
721 rows × 2 columns
2. Select all pokemons with a Speed of 10
or less¶
In [35]:
sns.boxplot(data=df, x='Speed')
Out[35]:
<Axes: xlabel='Speed'>
In [43]:
slow_pokemons_df = df[df['Speed']<=10]
3. How many Pokemons have a Sp. Def
value of 25 or less?¶
In [45]:
df.head()
Out[45]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
4 | 5 | Charmeleon | Fire | NaN | 405 | 58 | 64 | 58 | 80 | 65 | 80 | 1 | False |
In [48]:
df[df['Sp. Def']<=25].count()
Out[48]:
# 17 Name 17 Type 1 17 Type 2 6 Total 17 HP 17 Attack 17 Defense 17 Sp. Atk 17 Sp. Def 17 Speed 17 Generation 17 Legendary 17 dtype: int64
4. Select all the Legendary pokemons¶
In [49]:
# Try your code here
legendary_df = df[df['Legendary']==True]
5. Find the outlier¶
Find the pokemon that is clearly an outlier in terms of Attack / Defense:
In [51]:
ax = sns.scatterplot(data=df, x="Defense", y="Attack")
ax.annotate(
"Who's this guy?", xy=(228, 10), xytext=(150, 10), color='red',
arrowprops=dict(arrowstyle="->", color='red')
)
Out[51]:
Text(150, 10, "Who's this guy?")
In [52]:
df[(df['Defense']>200)&(df['Attack']<20)]
Out[52]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
212 | 213 | Shuckle | Bug | Rock | 505 | 20 | 10 | 230 | 10 | 230 | 5 | 2 | False |
Advanced selection¶
Now let's use boolean operators to create more advanced expressions
6. How many Fire-Flying Pokemons are there?¶
In [61]:
df[(df['Type 1']=='Fire')&(df['Type 2']=='Flying')]
Out[61]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 6 | Charizard | Fire | Flying | 534 | 78 | 84 | 78 | 109 | 85 | 100 | 1 | False |
145 | 146 | Moltres | Fire | Flying | 580 | 90 | 100 | 90 | 125 | 85 | 90 | 1 | True |
249 | 250 | Ho-oh | Fire | Flying | 680 | 106 | 130 | 90 | 110 | 154 | 90 | 2 | True |
661 | 662 | Fletchinder | Fire | Flying | 382 | 62 | 73 | 55 | 56 | 52 | 84 | 6 | False |
662 | 663 | Talonflame | Fire | Flying | 499 | 78 | 81 | 71 | 74 | 69 | 126 | 6 | False |
7. How many 'Poison' pokemons are across both types?¶
In [65]:
df[(df['Type 1']=='Poison')|(df['Type 2']=='Poison')].count()
Out[65]:
# 59 Name 59 Type 1 59 Type 2 44 Total 59 HP 59 Attack 59 Defense 59 Sp. Atk 59 Sp. Def 59 Speed 59 Generation 59 Legendary 59 dtype: int64
In [63]:
df[(df['Type 1']=='Poison')].count()
Out[63]:
# 28 Name 28 Type 1 28 Type 2 13 Total 28 HP 28 Attack 28 Defense 28 Sp. Atk 28 Sp. Def 28 Speed 28 Generation 28 Legendary 28 dtype: int64
In [64]:
df[(df['Type 2']=='Poison')].count()
Out[64]:
# 31 Name 31 Type 1 31 Type 2 31 Total 31 HP 31 Attack 31 Defense 31 Sp. Atk 31 Sp. Def 31 Speed 31 Generation 31 Legendary 31 dtype: int64
8. Name the pokemon of Type 1
Ice which has the strongest defense?¶
In [79]:
df[df['Type 1']=='Ice'].sort_values(by='Defense', ascending=False)
Out[79]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
712 | 713 | Avalugg | Ice | NaN | 514 | 95 | 117 | 184 | 44 | 46 | 28 | 6 | False |
470 | 471 | Glaceon | Ice | NaN | 525 | 65 | 60 | 110 | 130 | 95 | 65 | 4 | False |
143 | 144 | Articuno | Ice | Flying | 580 | 90 | 85 | 100 | 95 | 125 | 85 | 1 | True |
377 | 378 | Regice | Ice | NaN | 580 | 80 | 50 | 100 | 100 | 200 | 50 | 3 | True |
364 | 365 | Walrein | Ice | Water | 530 | 110 | 80 | 90 | 95 | 90 | 65 | 3 | False |
711 | 712 | Bergmite | Ice | NaN | 304 | 55 | 69 | 85 | 32 | 35 | 28 | 6 | False |
583 | 584 | Vanilluxe | Ice | NaN | 535 | 71 | 95 | 85 | 110 | 95 | 79 | 5 | False |
613 | 614 | Beartic | Ice | NaN | 485 | 95 | 110 | 80 | 70 | 80 | 50 | 5 | False |
220 | 221 | Piloswine | Ice | Ground | 450 | 100 | 100 | 80 | 60 | 60 | 50 | 2 | False |
472 | 473 | Mamoswine | Ice | Ground | 530 | 110 | 130 | 80 | 70 | 60 | 80 | 4 | False |
361 | 362 | Glalie | Ice | NaN | 480 | 80 | 80 | 80 | 80 | 80 | 80 | 3 | False |
477 | 478 | Froslass | Ice | Ghost | 480 | 70 | 80 | 70 | 80 | 70 | 110 | 4 | False |
363 | 364 | Sealeo | Ice | Water | 410 | 90 | 60 | 70 | 75 | 70 | 45 | 3 | False |
582 | 583 | Vanillish | Ice | NaN | 395 | 51 | 65 | 65 | 80 | 75 | 59 | 5 | False |
360 | 361 | Snorunt | Ice | NaN | 300 | 50 | 50 | 50 | 50 | 50 | 50 | 3 | False |
362 | 363 | Spheal | Ice | Water | 290 | 70 | 40 | 50 | 55 | 50 | 25 | 3 | False |
581 | 582 | Vanillite | Ice | NaN | 305 | 36 | 50 | 50 | 65 | 60 | 44 | 5 | False |
224 | 225 | Delibird | Ice | Flying | 330 | 45 | 55 | 45 | 65 | 45 | 75 | 2 | False |
219 | 220 | Swinub | Ice | Ground | 250 | 50 | 50 | 40 | 30 | 30 | 50 | 2 | False |
612 | 613 | Cubchoo | Ice | NaN | 305 | 55 | 70 | 40 | 60 | 40 | 40 | 5 | False |
123 | 124 | Jynx | Ice | Psychic | 455 | 65 | 50 | 35 | 115 | 95 | 95 | 1 | False |
614 | 615 | Cryogonal | Ice | NaN | 485 | 70 | 50 | 30 | 95 | 135 | 105 | 5 | False |
237 | 238 | Smoochum | Ice | Psychic | 305 | 45 | 30 | 15 | 85 | 65 | 65 | 2 | False |
9. What's the most common type of Legendary Pokemons?¶
In [85]:
print(df[df['Legendary']==True].value_counts('Type 1'))
Type 1 Psychic 8 Dragon 7 Fire 5 Steel 4 Water 3 Rock 3 Electric 3 Normal 2 Ice 2 Ground 2 Dark 2 Grass 2 Ghost 1 Fairy 1 Flying 1 Name: count, dtype: int64
10. What's the most powerful pokemon from the first 3 generations, of type water?¶
In [86]:
df.head()
Out[86]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
4 | 5 | Charmeleon | Fire | NaN | 405 | 58 | 64 | 58 | 80 | 65 | 80 | 1 | False |
In [105]:
q10=df[
(
df['Generation'].isin([1,2,3])
)
& (
df['Type 1']=='Water'
)]
q10[q10['Total']==q10['Total'].max()]
Out[105]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
381 | 382 | Kyogre | Water | NaN | 670 | 100 | 100 | 90 | 150 | 140 | 90 | 3 | True |
11. What's the most powerful Dragon from the last two generations?¶
In [107]:
mon=df[(df['Generation'].isin([5,6]))&((df['Type 1']=='Dragon')|(df['Type 2']=='Dragon'))]
mon[mon['Total']==mon['Total'].max()]
Out[107]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
642 | 643 | Reshiram | Dragon | Fire | 680 | 100 | 120 | 100 | 150 | 120 | 90 | 5 | True |
643 | 644 | Zekrom | Dragon | Electric | 680 | 100 | 150 | 120 | 120 | 100 | 90 | 5 | True |
In [108]:
df.head()
Out[108]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
4 | 5 | Charmeleon | Fire | NaN | 405 | 58 | 64 | 58 | 80 | 65 | 80 | 1 | False |
12. Select most powerful Fire-type pokemons¶
In [111]:
powerful_fire_df = df[(df['Attack']>100)&(df['Type 1']=='Fire')]
13. Select all Water-type, Flying-type pokemons¶
In [113]:
# Try your code here
water_flying_df = df[(df['Type 1']=='Water')&(df['Type 2']=='Flying')]
14. Select specific columns of Legendary pokemons of type Fire¶
In [129]:
legendary_fire_df=df[(df['Legendary']==True) & (df['Type 1']=='Fire')][['Name', 'Attack', 'Generation']]
15. Select Slow and Fast pokemons¶
This is the distribution of speed of the pokemons. The red lines indicate those bottom 5% and top 5% pokemons by speed:
In [131]:
ax = df['Speed'].plot(kind='hist', figsize=(10, 5), bins=100)
ax.axvline(df['Speed'].quantile(.05), color='red')
ax.axvline(df['Speed'].quantile(.95), color='red')
Out[131]:
<matplotlib.lines.Line2D at 0x79ca8fe15b90>
In [144]:
lowermargin=df['Speed'].quantile(.05)
uppermargin=df['Speed'].quantile(.95)
slow_fast_df=df[(df['Speed']<lowermargin)|(df['Speed']>uppermargin)]
In [140]:
print(lowermargin)
25.0
In [132]:
df['Speed'].describe()
Out[132]:
count 721.000000 mean 65.714286 std 27.277920 min 5.000000 25% 45.000000 50% 65.000000 75% 85.000000 max 160.000000 Name: Speed, dtype: float64
In [ ]:
# Try your code here
slow_fast_df = ...
16. Find the Ultra Powerful Legendary Pokemon¶
In [146]:
fig, ax = plt.subplots(figsize=(14, 7))
sns.scatterplot(data=df, x="Defense", y="Attack", hue='Legendary', ax=ax)
ax.annotate(
"Who's this guy?", xy=(140, 150), xytext=(160, 150), color='red',
arrowprops=dict(arrowstyle="->", color='red')
)
Out[146]:
Text(160, 150, "Who's this guy?")
In [147]:
df[(df['Legendary']==True)&(df['Defense']>130)&(df['Attack']>140)]
Out[147]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
382 | 383 | Groudon | Ground | Fire | 670 | 100 | 150 | 140 | 100 | 90 | 90 | 3 | True |