Project: Querying and Filtering Pokemon data¶

This project will help you practice your pandas querying and filtering skills. Let's begin!

No description has been provided for this image

Task 0 - Setup¶

There isn't much to do here, we'll provide the required imports and the read the pokemon CSV we'll be working with.

In [32]:

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

In [33]:

df = pd.read_csv("pokemon.csv")

In [34]:

df.head()

Out[34]:

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
0	1	Bulbasaur	Grass	Poison	318	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	405	60	62	63	80	80	60	1	False
2	3	Venusaur	Grass	Poison	525	80	82	83	100	100	80	1	False
3	4	Charmander	Fire	NaN	309	39	52	43	60	50	65	1	False
4	5	Charmeleon	Fire	NaN	405	58	64	58	80	65	80	1	False

In [35]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 721 entries, 0 to 720
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   #           721 non-null    int64 
 1   Name        721 non-null    object
 2   Type 1      721 non-null    object
 3   Type 2      359 non-null    object
 4   Total       721 non-null    int64 
 5   HP          721 non-null    int64 
 6   Attack      721 non-null    int64 
 7   Defense     721 non-null    int64 
 8   Sp. Atk     721 non-null    int64 
 9   Sp. Def     721 non-null    int64 
 10  Speed       721 non-null    int64 
 11  Generation  721 non-null    int64 
 12  Legendary   721 non-null    bool  
dtypes: bool(1), int64(9), object(3)
memory usage: 68.4+ KB

In [36]:

df.describe()

Out[36]:

	#	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation
count	721.00000	721.000000	721.000000	721.000000	721.000000	721.000000	721.000000	721.000000	721.000000
mean	361.00000	417.945908	68.380028	75.124827	70.697642	68.848821	69.180305	65.714286	3.323162
std	208.27906	109.663671	25.848272	29.070335	29.194941	28.898590	26.899364	27.277920	1.669873
min	1.00000	180.000000	1.000000	5.000000	5.000000	10.000000	20.000000	5.000000	1.000000
25%	181.00000	320.000000	50.000000	54.000000	50.000000	45.000000	50.000000	45.000000	2.000000
50%	361.00000	424.000000	65.000000	75.000000	65.000000	65.000000	65.000000	65.000000	3.000000
75%	541.00000	499.000000	80.000000	95.000000	85.000000	90.000000	85.000000	85.000000	5.000000
max	721.00000	720.000000	255.000000	165.000000	230.000000	154.000000	230.000000	160.000000	6.000000

Distribution of Pokemon Types:¶

In [37]:

df['Type 1'].value_counts().plot(kind='pie', autopct='%1.1f%%', cmap='tab20c', figsize=(10, 8))

Out[37]:

<Axes: ylabel='count'>

Distribution of Pokemon Totals:¶

In [38]:

df['Total'].plot(kind='hist', figsize=(10, 8))

Out[38]:

<Axes: ylabel='Frequency'>

In [40]:

df['Total'].plot(kind='box', vert=False, figsize=(10, 5))

Out[40]:

<Axes: >

Distribution of Legendary Pokemons:¶

In [41]:

df['Legendary'].value_counts().plot(kind='pie', autopct='%1.1f%%', cmap='Set3', figsize=(10, 8))

Out[41]:

<Axes: ylabel='count'>

Basic filtering¶

Let's start with a few simple activities regarding filtering.

1. How many Pokemons exist with an `Attack` value greater than 150?¶

Doing a little bit of visual exploration, we can have a sense of the most "powerful" pokemons (defined by their "Attack" feature). A boxplot is a great way to visualize this:

In [42]:

sns.boxplot(data=df, x='Attack')

Out[42]:

<Axes: xlabel='Attack'>

In [43]:

# Try your code here

2. Select all pokemons with a Speed of `10` or less¶

In [44]:

sns.boxplot(data=df, x='Speed')

Out[44]:

<Axes: xlabel='Speed'>

In [45]:

slow_pokemons_df = df.loc[df['Speed']<=10]

3. How many Pokemons have a `Sp. Def` value of 25 or less?¶

In [46]:

df.loc[df['Sp. Def']<=25].shape

Out[46]:

(17, 13)

4. Select all the Legendary pokemons¶

In [47]:

# Try your code here
legendary_df = df.loc[df['Legendary']]

5. Find the outlier¶

Find the pokemon that is clearly an outlier in terms of Attack / Defense:

In [48]:

ax = sns.scatterplot(data=df, x="Defense", y="Attack")
ax.annotate(
    "Who's this guy?", xy=(228, 10), xytext=(150, 10), color='red',
    arrowprops=dict(arrowstyle="->", color='red')
)

Out[48]:

Text(150, 10, "Who's this guy?")

In [49]:

df.query("Defense > 200")

Out[49]:

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
212	213	Shuckle	Bug	Rock	505	20	10	230	10	230	5	2	False

Advanced selection¶

Now let's use boolean operators to create more advanced expressions

6. How many Fire-Flying Pokemons are there?¶

In [50]:

>>> df.loc[df['Type 1'] == 'Fire', 'Type 2'].value_counts()

Out[50]:

Type 2
Fighting    6
Flying      5
Ground      2
Psychic     2
Normal      2
Rock        1
Steel       1
Water       1
Name: count, dtype: int64

7. How many 'Poison' pokemons are across both types?¶

In [51]:

>>> df.loc[(df['Type 1'] == 'Poison') | (df['Type 2'] == 'Poison')].shape

Out[51]:

(59, 13)

8. Name the pokemon of `Type 1` Ice which has the strongest defense?¶

In [52]:

df.loc[df['Type 1'] == 'Ice'].sort_values(by='Defense', ascending=False)
# Try your code here

Out[52]:

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
712	713	Avalugg	Ice	NaN	514	95	117	184	44	46	28	6	False
470	471	Glaceon	Ice	NaN	525	65	60	110	130	95	65	4	False
143	144	Articuno	Ice	Flying	580	90	85	100	95	125	85	1	True
377	378	Regice	Ice	NaN	580	80	50	100	100	200	50	3	True
364	365	Walrein	Ice	Water	530	110	80	90	95	90	65	3	False
711	712	Bergmite	Ice	NaN	304	55	69	85	32	35	28	6	False
583	584	Vanilluxe	Ice	NaN	535	71	95	85	110	95	79	5	False
613	614	Beartic	Ice	NaN	485	95	110	80	70	80	50	5	False
220	221	Piloswine	Ice	Ground	450	100	100	80	60	60	50	2	False
472	473	Mamoswine	Ice	Ground	530	110	130	80	70	60	80	4	False
361	362	Glalie	Ice	NaN	480	80	80	80	80	80	80	3	False
477	478	Froslass	Ice	Ghost	480	70	80	70	80	70	110	4	False
363	364	Sealeo	Ice	Water	410	90	60	70	75	70	45	3	False
582	583	Vanillish	Ice	NaN	395	51	65	65	80	75	59	5	False
360	361	Snorunt	Ice	NaN	300	50	50	50	50	50	50	3	False
362	363	Spheal	Ice	Water	290	70	40	50	55	50	25	3	False
581	582	Vanillite	Ice	NaN	305	36	50	50	65	60	44	5	False
224	225	Delibird	Ice	Flying	330	45	55	45	65	45	75	2	False
219	220	Swinub	Ice	Ground	250	50	50	40	30	30	50	2	False
612	613	Cubchoo	Ice	NaN	305	55	70	40	60	40	40	5	False
123	124	Jynx	Ice	Psychic	455	65	50	35	115	95	95	1	False
614	615	Cryogonal	Ice	NaN	485	70	50	30	95	135	105	5	False
237	238	Smoochum	Ice	Psychic	305	45	30	15	85	65	65	2	False

9. What's the most common type of Legendary Pokemons?¶

In [53]:

df.loc[df['Legendary'], 'Type 1'].value_counts()

Out[53]:

Type 1
Psychic     8
Dragon      7
Fire        5
Steel       4
Water       3
Rock        3
Electric    3
Ice         2
Ground      2
Normal      2
Dark        2
Grass       2
Ghost       1
Flying      1
Fairy       1
Name: count, dtype: int64

10. What's the most powerful pokemon from the first 3 generations, of type water?¶

In [54]:

df.loc[df['Generation'].isin((1, 2, 3)) & (df['Type 1'] == 'Water')].sort_values(by='Total', ascending=False).head()

Out[54]:

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
381	382	Kyogre	Water	NaN	670	100	100	90	150	140	90	3	True
244	245	Suicune	Water	NaN	580	100	75	115	90	115	85	2	True
229	230	Kingdra	Water	Dragon	540	75	95	95	95	95	85	2	False
129	130	Gyarados	Water	Flying	540	95	125	79	60	100	81	1	False
349	350	Milotic	Water	NaN	540	95	60	79	100	125	81	3	False

11. What's the most powerful Dragon from the last two generations?¶

In [55]:

df.loc[
    ((df['Type 1'] == 'Dragon') | (df['Type 2'] == 'Dragon')) &
    df['Generation'].isin((5, 6))
].sort_values(by='Total', ascending=False).head()

Out[55]:

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Generation	Legendary
643	644	Zekrom	Dragon	Electric	680	100	150	120	120	100	90	5	True
642	643	Reshiram	Dragon	Fire	680	100	120	100	150	120	90	5	True
645	646	Kyurem	Dragon	Ice	660	125	130	90	130	90	95	5	True
705	706	Goodra	Dragon	NaN	600	90	100	70	110	150	80	6	False
717	718	Zygarde50% Forme	Dragon	Ground	600	108	100	121	81	95	95	6	True

12. Select most powerful Fire-type pokemons¶

In [56]:

powerful_fire_df = df.loc[
    (df['Type 1'] == 'Fire') &
    (df['Attack'] > 100)
]

13. Select all Water-type, Flying-type pokemons¶

In [58]:

water_flying_df = df.query("`Type 1` == 'Water' and `Type 2` == 'Flying'")

14. Select specific columns of Legendary pokemons of type Fire¶

In [60]:

legendary_fire_df = df.loc[
    df['Legendary'] & (df['Type 1'] == 'Fire'),  # Condition (or index-level)
    ['Name', 'Attack', 'Generation']             # Columns
]

15. Select Slow and Fast pokemons¶

This is the distribution of speed of the pokemons. The red lines indicate those bottom 5% and top 5% pokemons by speed:

In [ ]:

ax = df['Speed'].plot(kind='hist', figsize=(10, 5), bins=100)
ax.axvline(df['Speed'].quantile(.05), color='red')
ax.axvline(df['Speed'].quantile(.95), color='red')

In [ ]:

slow_fast_df = df.loc[
    (df['Speed'] < df['Speed'].quantile(.05)) |
    (df['Speed'] > df['Speed'].quantile(.95))
]

16. Find the Ultra Powerful Legendary Pokemon¶

In [ ]:

fig, ax = plt.subplots(figsize=(14, 7))
sns.scatterplot(data=df, x="Defense", y="Attack", hue='Legendary', ax=ax)
ax.annotate(
    "Who's this guy?", xy=(140, 150), xytext=(160, 150), color='red',
    arrowprops=dict(arrowstyle="->", color='red')
)

In [64]:

df.loc[
    df['Legendary'] &
    (df['Attack'] > 140)
, ['Name', 'Attack', 'Defense']].sort_values(by=['Defense'], ascending=False)

Out[64]:

	Name	Attack	Defense
382	Groudon	150	140
643	Zekrom	150	120
485	Regigigas	160	110
383	Rayquaza	150	90
385	DeoxysNormal Forme	150	50

Statement of Completion#c2f0a305

Intro to Pandas for Data Analysis

Practicing filtering sorting with Pokemon

Project: Querying and Filtering Pokemon data¶

Task 0 - Setup¶

Distribution of Pokemon Types:¶

Distribution of Pokemon Totals:¶

Distribution of Legendary Pokemons:¶

Basic filtering¶

1. How many Pokemons exist with an `Attack` value greater than 150?¶

2. Select all pokemons with a Speed of `10` or less¶

3. How many Pokemons have a `Sp. Def` value of 25 or less?¶

4. Select all the Legendary pokemons¶

5. Find the outlier¶

Advanced selection¶

6. How many Fire-Flying Pokemons are there?¶

7. How many 'Poison' pokemons are across both types?¶

8. Name the pokemon of `Type 1` Ice which has the strongest defense?¶

9. What's the most common type of Legendary Pokemons?¶

10. What's the most powerful pokemon from the first 3 generations, of type water?¶

11. What's the most powerful Dragon from the last two generations?¶

12. Select most powerful Fire-type pokemons¶

13. Select all Water-type, Flying-type pokemons¶

14. Select specific columns of Legendary pokemons of type Fire¶

15. Select Slow and Fast pokemons¶

16. Find the Ultra Powerful Legendary Pokemon¶

The End!¶

Statement of Completion#c2f0a305

Intro to Pandas for Data Analysis

Practicing filtering sorting with Pokemon

Project: Querying and Filtering Pokemon data¶

Task 0 - Setup¶

Distribution of Pokemon Types:¶

Distribution of Pokemon Totals:¶

Distribution of Legendary Pokemons:¶

Basic filtering¶

1. How many Pokemons exist with an Attack value greater than 150?¶

2. Select all pokemons with a Speed of 10 or less¶

3. How many Pokemons have a Sp. Def value of 25 or less?¶

4. Select all the Legendary pokemons¶

5. Find the outlier¶

Advanced selection¶

6. How many Fire-Flying Pokemons are there?¶

7. How many 'Poison' pokemons are across both types?¶

8. Name the pokemon of Type 1 Ice which has the strongest defense?¶

9. What's the most common type of Legendary Pokemons?¶

10. What's the most powerful pokemon from the first 3 generations, of type water?¶

11. What's the most powerful Dragon from the last two generations?¶

12. Select most powerful Fire-type pokemons¶

13. Select all Water-type, Flying-type pokemons¶

14. Select specific columns of Legendary pokemons of type Fire¶

15. Select Slow and Fast pokemons¶

16. Find the Ultra Powerful Legendary Pokemon¶

The End!¶

1. How many Pokemons exist with an `Attack` value greater than 150?¶

2. Select all pokemons with a Speed of `10` or less¶

3. How many Pokemons have a `Sp. Def` value of 25 or less?¶

8. Name the pokemon of `Type 1` Ice which has the strongest defense?¶