Statement of Completion#b6aac54c
Intro to Pandas for Data Analysis
easy
Practicing filtering sorting with Pokemon
Resolution
Activities
Task 0 - Setup¶
There isn't much to do here, we'll provide the required imports and the read the pokemon CSV we'll be working with.
In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [4]:
df = pd.read_csv("pokemon.csv")
In [5]:
df.head()
Out[5]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
4 | 5 | Charmeleon | Fire | NaN | 405 | 58 | 64 | 58 | 80 | 65 | 80 | 1 | False |
In [4]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 721 entries, 0 to 720 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 # 721 non-null int64 1 Name 721 non-null object 2 Type 1 721 non-null object 3 Type 2 359 non-null object 4 Total 721 non-null int64 5 HP 721 non-null int64 6 Attack 721 non-null int64 7 Defense 721 non-null int64 8 Sp. Atk 721 non-null int64 9 Sp. Def 721 non-null int64 10 Speed 721 non-null int64 11 Generation 721 non-null int64 12 Legendary 721 non-null bool dtypes: bool(1), int64(9), object(3) memory usage: 68.4+ KB
In [5]:
df.describe()
Out[5]:
# | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | |
---|---|---|---|---|---|---|---|---|---|
count | 721.00000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 | 721.000000 |
mean | 361.00000 | 417.945908 | 68.380028 | 75.124827 | 70.697642 | 68.848821 | 69.180305 | 65.714286 | 3.323162 |
std | 208.27906 | 109.663671 | 25.848272 | 29.070335 | 29.194941 | 28.898590 | 26.899364 | 27.277920 | 1.669873 |
min | 1.00000 | 180.000000 | 1.000000 | 5.000000 | 5.000000 | 10.000000 | 20.000000 | 5.000000 | 1.000000 |
25% | 181.00000 | 320.000000 | 50.000000 | 54.000000 | 50.000000 | 45.000000 | 50.000000 | 45.000000 | 2.000000 |
50% | 361.00000 | 424.000000 | 65.000000 | 75.000000 | 65.000000 | 65.000000 | 65.000000 | 65.000000 | 3.000000 |
75% | 541.00000 | 499.000000 | 80.000000 | 95.000000 | 85.000000 | 90.000000 | 85.000000 | 85.000000 | 5.000000 |
max | 721.00000 | 720.000000 | 255.000000 | 165.000000 | 230.000000 | 154.000000 | 230.000000 | 160.000000 | 6.000000 |
Distribution of Pokemon Types:¶
In [ ]:
df['Type 1'].value_counts().plot(kind='pie', autopct='%1.1f%%', cmap='tab20c', figsize=(10, 8))
Distribution of Pokemon Totals:¶
In [ ]:
df['Total'].plot(kind='hist', figsize=(10, 8))
In [ ]:
df['Total'].plot(kind='box', vert=False, figsize=(10, 5))
Distribution of Legendary Pokemons:¶
In [ ]:
df['Legendary'].value_counts().plot(kind='pie', autopct='%1.1f%%', cmap='Set3', figsize=(10, 8))
Basic filtering¶
Let's start with a few simple activities regarding filtering.
1. How many Pokemons exist with an Attack
value greater than 150?¶
Doing a little bit of visual exploration, we can have a sense of the most "powerful" pokemons (defined by their "Attack" feature). A boxplot is a great way to visualize this:
In [ ]:
sns.boxplot(data=df, x='Attack')
In [ ]:
# Try your code here
2. Select all pokemons with a Speed of 10
or less¶
In [ ]:
sns.boxplot(data=df, x='Speed')
In [36]:
slow_pokemons_df = df.loc[df['Speed']<=10]
3. How many Pokemons have a Sp. Def
value of 25 or less?¶
In [ ]:
# Try your code here
4. Select all the Legendary pokemons¶
In [38]:
# Try your code here
legendary_df = df.loc[df['Legendary']]
5. Find the outlier¶
Find the pokemon that is clearly an outlier in terms of Attack / Defense:
In [ ]:
ax = sns.scatterplot(data=df, x="Defense", y="Attack")
ax.annotate(
"Who's this guy?", xy=(228, 10), xytext=(150, 10), color='red',
arrowprops=dict(arrowstyle="->", color='red')
)
In [ ]:
# Try your code here
Advanced selection¶
Now let's use boolean operators to create more advanced expressions
6. How many Fire-Flying Pokemons are there?¶
In [ ]:
# Try your code here
7. How many 'Poison' pokemons are across both types?¶
In [ ]:
# Try your code here
8. Name the pokemon of Type 1
Ice which has the strongest defense?¶
In [ ]:
# Try your code here
9. What's the most common type of Legendary Pokemons?¶
In [ ]:
# Try your code here
10. What's the most powerful pokemon from the first 3 generations, of type water?¶
In [ ]:
# Try your code here
11. What's the most powerful Dragon from the last two generations?¶
In [6]:
df['Generation'].value_counts()
Out[6]:
Generation 5 156 1 151 3 135 4 107 2 100 6 72 Name: count, dtype: int64
In [16]:
# Try your code here
df.loc[(df['Generation'].isin((5,6))) & ((df['Type 1']=='Dragon') | (df['Type 2'] == 'Dragon'))].sort_values(by='Total', ascending=False)
Out[16]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
643 | 644 | Zekrom | Dragon | Electric | 680 | 100 | 150 | 120 | 120 | 100 | 90 | 5 | True |
642 | 643 | Reshiram | Dragon | Fire | 680 | 100 | 120 | 100 | 150 | 120 | 90 | 5 | True |
645 | 646 | Kyurem | Dragon | Ice | 660 | 125 | 130 | 90 | 130 | 90 | 95 | 5 | True |
705 | 706 | Goodra | Dragon | NaN | 600 | 90 | 100 | 70 | 110 | 150 | 80 | 6 | False |
717 | 718 | Zygarde50% Forme | Dragon | Ground | 600 | 108 | 100 | 121 | 81 | 95 | 95 | 6 | True |
634 | 635 | Hydreigon | Dark | Dragon | 600 | 92 | 105 | 90 | 125 | 90 | 98 | 5 | False |
611 | 612 | Haxorus | Dragon | NaN | 540 | 76 | 147 | 90 | 60 | 70 | 97 | 5 | False |
714 | 715 | Noivern | Flying | Dragon | 535 | 85 | 70 | 80 | 97 | 80 | 123 | 6 | False |
696 | 697 | Tyrantrum | Rock | Dragon | 521 | 82 | 121 | 119 | 69 | 59 | 71 | 6 | False |
690 | 691 | Dragalge | Poison | Dragon | 494 | 65 | 75 | 90 | 97 | 123 | 44 | 6 | False |
620 | 621 | Druddigon | Dragon | NaN | 485 | 77 | 120 | 90 | 60 | 90 | 48 | 5 | False |
704 | 705 | Sliggoo | Dragon | NaN | 452 | 68 | 75 | 53 | 83 | 113 | 60 | 6 | False |
633 | 634 | Zweilous | Dark | Dragon | 420 | 72 | 85 | 70 | 65 | 70 | 58 | 5 | False |
610 | 611 | Fraxure | Dragon | NaN | 410 | 66 | 117 | 70 | 40 | 50 | 67 | 5 | False |
695 | 696 | Tyrunt | Rock | Dragon | 362 | 58 | 89 | 77 | 45 | 45 | 48 | 6 | False |
609 | 610 | Axew | Dragon | NaN | 320 | 46 | 87 | 60 | 30 | 40 | 57 | 5 | False |
632 | 633 | Deino | Dark | Dragon | 300 | 52 | 65 | 50 | 45 | 50 | 38 | 5 | False |
703 | 704 | Goomy | Dragon | NaN | 300 | 45 | 50 | 35 | 55 | 75 | 40 | 6 | False |
713 | 714 | Noibat | Flying | Dragon | 245 | 40 | 30 | 35 | 45 | 40 | 55 | 6 | False |
12. Select most powerful Fire-type pokemons¶
In [14]:
# Try your code here
powerful_fire_df = df.loc[(df['Type 1']=='Fire') & (df['Attack']>100)]
13. Select all Water-type, Flying-type pokemons¶
In [17]:
# Try your code here
water_flying_df = df.loc[(df['Type 1']=='Water') & (df['Type 2']=='Flying')]
14. Select specific columns of Legendary pokemons of type Fire¶
In [21]:
# Try your code here
legendary_fire_df = df.loc[(df['Type 1']=='Fire') & (df['Legendary']),['Name','Attack','Generation']]
15. Select Slow and Fast pokemons¶
This is the distribution of speed of the pokemons. The red lines indicate those bottom 5% and top 5% pokemons by speed:
In [23]:
ax = df['Speed'].plot(kind='hist', figsize=(10, 5), bins=100)
ax.axvline(df['Speed'].quantile(.05), color='red')
ax.axvline(df['Speed'].quantile(.95), color='red')
Out[23]:
<matplotlib.lines.Line2D at 0x73d63e579150>
In [33]:
# Try your code here
slow_fast_df = df.loc[(df['Speed']< df['Speed'].quantile(0.05)) | (df['Speed']> df['Speed'].quantile(0.95))]
#slow_fast_df = df.loc[(df['Speed'] < df['Speed'].quantile(.05)) |(df['Speed'] > df['Speed'].quantile(.95))]
16. Find the Ultra Powerful Legendary Pokemon¶
In [ ]:
fig, ax = plt.subplots(figsize=(14, 7))
sns.scatterplot(data=df, x="Defense", y="Attack", hue='Legendary', ax=ax)
ax.annotate(
"Who's this guy?", xy=(140, 150), xytext=(160, 150), color='red',
arrowprops=dict(arrowstyle="->", color='red')
)
In [35]:
# Try your code here
df.loc[(df['Attack']>140)&(df['Defense']>120)].sort_values(by='Defense', ascending = False)
Out[35]:
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
382 | 383 | Groudon | Ground | Fire | 670 | 100 | 150 | 140 | 100 | 90 | 90 | 3 | True |