Statement of Completion#c2f0a305
Intro to Pandas for Data Analysis
easy
Practicing filtering sorting with Pokemon
Resolution
Activities
Task 0 - Setup¶
There isn't much to do here, we'll provide the required imports and the read the pokemon CSV we'll be working with.
In [32]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [33]:
df = pd.read_csv("pokemon.csv")
In [34]:
df.head()
Out[34]:
| # | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
| 1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
| 2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
| 3 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
| 4 | 5 | Charmeleon | Fire | NaN | 405 | 58 | 64 | 58 | 80 | 65 | 80 | 1 | False |
In [35]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 721 entries, 0 to 720 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 # 721 non-null int64 1 Name 721 non-null object 2 Type 1 721 non-null object 3 Type 2 359 non-null object 4 Total 721 non-null int64 5 HP 721 non-null int64 6 Attack 721 non-null int64 7 Defense 721 non-null int64 8 Sp. Atk 721 non-null int64 9 Sp. Def 721 non-null int64 10 Speed 721 non-null int64 11 Generation 721 non-null int64 12 Legendary 721 non-null bool dtypes: bool(1), int64(9), object(3) memory usage: 68.4+ KB
In [36]:
df.describe()
Out[36]:
| # | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | |
|---|---|---|---|---|---|---|---|---|---|
| count | 721.00000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 |
| mean | 361.00000 | 417.945908 | 68.380028 | 75.124827 | 70.697642 | 68.848821 | 69.180305 | 65.714286 | 3.323162 |
| std | 208.27906 | 109.663671 | 25.848272 | 29.070335 | 29.194941 | 28.898590 | 26.899364 | 27.277920 | 1.669873 |
| min | 1.00000 | 180.000000 | 1.000000 | 5.000000 | 5.000000 | 10.000000 | 20.000000 | 5.000000 | 1.000000 |
| 25% | 181.00000 | 320.000000 | 50.000000 | 54.000000 | 50.000000 | 45.000000 | 50.000000 | 45.000000 | 2.000000 |
| 50% | 361.00000 | 424.000000 | 65.000000 | 75.000000 | 65.000000 | 65.000000 | 65.000000 | 65.000000 | 3.000000 |
| 75% | 541.00000 | 499.000000 | 80.000000 | 95.000000 | 85.000000 | 90.000000 | 85.000000 | 85.000000 | 5.000000 |
| max | 721.00000 | 720.000000 | 255.000000 | 165.000000 | 230.000000 | 154.000000 | 230.000000 | 160.000000 | 6.000000 |
Distribution of Pokemon Types:¶
In [37]:
df['Type 1'].value_counts().plot(kind='pie', autopct='%1.1f%%', cmap='tab20c', figsize=(10, 8))
Out[37]:
<Axes: ylabel='count'>
Distribution of Pokemon Totals:¶
In [38]:
df['Total'].plot(kind='hist', figsize=(10, 8))
Out[38]:
<Axes: ylabel='Frequency'>
In [40]:
df['Total'].plot(kind='box', vert=False, figsize=(10, 5))
Out[40]:
<Axes: >
Distribution of Legendary Pokemons:¶
In [41]:
df['Legendary'].value_counts().plot(kind='pie', autopct='%1.1f%%', cmap='Set3', figsize=(10, 8))
Out[41]:
<Axes: ylabel='count'>
Basic filtering¶
Let's start with a few simple activities regarding filtering.
1. How many Pokemons exist with an Attack value greater than 150?¶
Doing a little bit of visual exploration, we can have a sense of the most "powerful" pokemons (defined by their "Attack" feature). A boxplot is a great way to visualize this:
In [42]:
sns.boxplot(data=df, x='Attack')
Out[42]:
<Axes: xlabel='Attack'>
In [43]:
# Try your code here
2. Select all pokemons with a Speed of 10 or less¶
In [44]:
sns.boxplot(data=df, x='Speed')
Out[44]:
<Axes: xlabel='Speed'>
In [45]:
slow_pokemons_df = df.loc[df['Speed']<=10]
3. How many Pokemons have a Sp. Def value of 25 or less?¶
In [46]:
df.loc[df['Sp. Def']<=25].shape
Out[46]:
(17, 13)
4. Select all the Legendary pokemons¶
In [47]:
# Try your code here
legendary_df = df.loc[df['Legendary']]
5. Find the outlier¶
Find the pokemon that is clearly an outlier in terms of Attack / Defense:
In [48]:
ax = sns.scatterplot(data=df, x="Defense", y="Attack")
ax.annotate(
"Who's this guy?", xy=(228, 10), xytext=(150, 10), color='red',
arrowprops=dict(arrowstyle="->", color='red')
)
Out[48]:
Text(150, 10, "Who's this guy?")
In [49]:
df.query("Defense > 200")
Out[49]:
| # | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 212 | 213 | Shuckle | Bug | Rock | 505 | 20 | 10 | 230 | 10 | 230 | 5 | 2 | False |
Advanced selection¶
Now let's use boolean operators to create more advanced expressions
6. How many Fire-Flying Pokemons are there?¶
In [50]:
>>> df.loc[df['Type 1'] == 'Fire', 'Type 2'].value_counts()
Out[50]:
Type 2 Fighting 6 Flying 5 Ground 2 Psychic 2 Normal 2 Rock 1 Steel 1 Water 1 Name: count, dtype: int64
7. How many 'Poison' pokemons are across both types?¶
In [51]:
>>> df.loc[(df['Type 1'] == 'Poison') | (df['Type 2'] == 'Poison')].shape
Out[51]:
(59, 13)
8. Name the pokemon of Type 1 Ice which has the strongest defense?¶
In [52]:
df.loc[df['Type 1'] == 'Ice'].sort_values(by='Defense', ascending=False)
# Try your code here
Out[52]:
| # | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 712 | 713 | Avalugg | Ice | NaN | 514 | 95 | 117 | 184 | 44 | 46 | 28 | 6 | False |
| 470 | 471 | Glaceon | Ice | NaN | 525 | 65 | 60 | 110 | 130 | 95 | 65 | 4 | False |
| 143 | 144 | Articuno | Ice | Flying | 580 | 90 | 85 | 100 | 95 | 125 | 85 | 1 | True |
| 377 | 378 | Regice | Ice | NaN | 580 | 80 | 50 | 100 | 100 | 200 | 50 | 3 | True |
| 364 | 365 | Walrein | Ice | Water | 530 | 110 | 80 | 90 | 95 | 90 | 65 | 3 | False |
| 711 | 712 | Bergmite | Ice | NaN | 304 | 55 | 69 | 85 | 32 | 35 | 28 | 6 | False |
| 583 | 584 | Vanilluxe | Ice | NaN | 535 | 71 | 95 | 85 | 110 | 95 | 79 | 5 | False |
| 613 | 614 | Beartic | Ice | NaN | 485 | 95 | 110 | 80 | 70 | 80 | 50 | 5 | False |
| 220 | 221 | Piloswine | Ice | Ground | 450 | 100 | 100 | 80 | 60 | 60 | 50 | 2 | False |
| 472 | 473 | Mamoswine | Ice | Ground | 530 | 110 | 130 | 80 | 70 | 60 | 80 | 4 | False |
| 361 | 362 | Glalie | Ice | NaN | 480 | 80 | 80 | 80 | 80 | 80 | 80 | 3 | False |
| 477 | 478 | Froslass | Ice | Ghost | 480 | 70 | 80 | 70 | 80 | 70 | 110 | 4 | False |
| 363 | 364 | Sealeo | Ice | Water | 410 | 90 | 60 | 70 | 75 | 70 | 45 | 3 | False |
| 582 | 583 | Vanillish | Ice | NaN | 395 | 51 | 65 | 65 | 80 | 75 | 59 | 5 | False |
| 360 | 361 | Snorunt | Ice | NaN | 300 | 50 | 50 | 50 | 50 | 50 | 50 | 3 | False |
| 362 | 363 | Spheal | Ice | Water | 290 | 70 | 40 | 50 | 55 | 50 | 25 | 3 | False |
| 581 | 582 | Vanillite | Ice | NaN | 305 | 36 | 50 | 50 | 65 | 60 | 44 | 5 | False |
| 224 | 225 | Delibird | Ice | Flying | 330 | 45 | 55 | 45 | 65 | 45 | 75 | 2 | False |
| 219 | 220 | Swinub | Ice | Ground | 250 | 50 | 50 | 40 | 30 | 30 | 50 | 2 | False |
| 612 | 613 | Cubchoo | Ice | NaN | 305 | 55 | 70 | 40 | 60 | 40 | 40 | 5 | False |
| 123 | 124 | Jynx | Ice | Psychic | 455 | 65 | 50 | 35 | 115 | 95 | 95 | 1 | False |
| 614 | 615 | Cryogonal | Ice | NaN | 485 | 70 | 50 | 30 | 95 | 135 | 105 | 5 | False |
| 237 | 238 | Smoochum | Ice | Psychic | 305 | 45 | 30 | 15 | 85 | 65 | 65 | 2 | False |
9. What's the most common type of Legendary Pokemons?¶
In [53]:
df.loc[df['Legendary'], 'Type 1'].value_counts()
Out[53]:
Type 1 Psychic 8 Dragon 7 Fire 5 Steel 4 Water 3 Rock 3 Electric 3 Ice 2 Ground 2 Normal 2 Dark 2 Grass 2 Ghost 1 Flying 1 Fairy 1 Name: count, dtype: int64
10. What's the most powerful pokemon from the first 3 generations, of type water?¶
In [54]:
df.loc[df['Generation'].isin((1, 2, 3)) & (df['Type 1'] == 'Water')].sort_values(by='Total', ascending=False).head()
Out[54]:
| # | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 381 | 382 | Kyogre | Water | NaN | 670 | 100 | 100 | 90 | 150 | 140 | 90 | 3 | True |
| 244 | 245 | Suicune | Water | NaN | 580 | 100 | 75 | 115 | 90 | 115 | 85 | 2 | True |
| 229 | 230 | Kingdra | Water | Dragon | 540 | 75 | 95 | 95 | 95 | 95 | 85 | 2 | False |
| 129 | 130 | Gyarados | Water | Flying | 540 | 95 | 125 | 79 | 60 | 100 | 81 | 1 | False |
| 349 | 350 | Milotic | Water | NaN | 540 | 95 | 60 | 79 | 100 | 125 | 81 | 3 | False |
11. What's the most powerful Dragon from the last two generations?¶
In [55]:
df.loc[
((df['Type 1'] == 'Dragon') | (df['Type 2'] == 'Dragon')) &
df['Generation'].isin((5, 6))
].sort_values(by='Total', ascending=False).head()
Out[55]:
| # | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 643 | 644 | Zekrom | Dragon | Electric | 680 | 100 | 150 | 120 | 120 | 100 | 90 | 5 | True |
| 642 | 643 | Reshiram | Dragon | Fire | 680 | 100 | 120 | 100 | 150 | 120 | 90 | 5 | True |
| 645 | 646 | Kyurem | Dragon | Ice | 660 | 125 | 130 | 90 | 130 | 90 | 95 | 5 | True |
| 705 | 706 | Goodra | Dragon | NaN | 600 | 90 | 100 | 70 | 110 | 150 | 80 | 6 | False |
| 717 | 718 | Zygarde50% Forme | Dragon | Ground | 600 | 108 | 100 | 121 | 81 | 95 | 95 | 6 | True |
12. Select most powerful Fire-type pokemons¶
In [56]:
powerful_fire_df = df.loc[
(df['Type 1'] == 'Fire') &
(df['Attack'] > 100)
]
13. Select all Water-type, Flying-type pokemons¶
In [58]:
water_flying_df = df.query("`Type 1` == 'Water' and `Type 2` == 'Flying'")
14. Select specific columns of Legendary pokemons of type Fire¶
In [60]:
legendary_fire_df = df.loc[
df['Legendary'] & (df['Type 1'] == 'Fire'), # Condition (or index-level)
['Name', 'Attack', 'Generation'] # Columns
]
15. Select Slow and Fast pokemons¶
This is the distribution of speed of the pokemons. The red lines indicate those bottom 5% and top 5% pokemons by speed:
In [ ]:
ax = df['Speed'].plot(kind='hist', figsize=(10, 5), bins=100)
ax.axvline(df['Speed'].quantile(.05), color='red')
ax.axvline(df['Speed'].quantile(.95), color='red')
In [ ]:
slow_fast_df = df.loc[
(df['Speed'] < df['Speed'].quantile(.05)) |
(df['Speed'] > df['Speed'].quantile(.95))
]
16. Find the Ultra Powerful Legendary Pokemon¶
In [ ]:
fig, ax = plt.subplots(figsize=(14, 7))
sns.scatterplot(data=df, x="Defense", y="Attack", hue='Legendary', ax=ax)
ax.annotate(
"Who's this guy?", xy=(140, 150), xytext=(160, 150), color='red',
arrowprops=dict(arrowstyle="->", color='red')
)
In [64]:
df.loc[
df['Legendary'] &
(df['Attack'] > 140)
, ['Name', 'Attack', 'Defense']].sort_values(by=['Defense'], ascending=False)
Out[64]:
| Name | Attack | Defense | |
|---|---|---|---|
| 382 | Groudon | 150 | 140 |
| 643 | Zekrom | 150 | 120 |
| 485 | Regigigas | 160 | 110 |
| 383 | Rayquaza | 150 | 90 |
| 385 | DeoxysNormal Forme | 150 | 50 |