Statement of Completion#4c2b767f
Intro to Pandas for Data Analysis
easy
Practicing Filtering and Selection with FIFA data
Resolution
Activities
Project.ipynb
In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
df = pd.read_csv('players_22.csv',dtype={25: str, 108: str})
Let's Analyze our DataFrame¶
we will see the first few rows of the dataframe using df.head()
, if we want to check the last few rows then we will use df.tail()
,
After that, we will check the names of columns using df.columns
, check the datatype, and values in each column using df.info()
.
Lastly, to find the statistical features we use df.describe()
.
In [3]:
df.head()
Out[3]:
sofifa_id | player_url | short_name | long_name | player_positions | overall | potential | value_eur | wage_eur | age | ... | lcb | cb | rcb | rb | gk | player_face_url | club_logo_url | club_flag_url | nation_logo_url | nation_flag_url | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 158023 | https://sofifa.com/player/158023/lionel-messi/... | L. Messi | Lionel Andrés Messi Cuccittini | RW, ST, CF | 93 | 93 | 78000000.0 | 320000.0 | 34 | ... | 50+3 | 50+3 | 50+3 | 61+3 | 19+3 | https://cdn.sofifa.net/players/158/023/22_120.png | https://cdn.sofifa.net/teams/73/60.png | https://cdn.sofifa.net/flags/fr.png | https://cdn.sofifa.net/teams/1369/60.png | https://cdn.sofifa.net/flags/ar.png |
1 | 188545 | https://sofifa.com/player/188545/robert-lewand... | R. Lewandowski | Robert Lewandowski | ST | 92 | 92 | 119500000.0 | 270000.0 | 32 | ... | 60+3 | 60+3 | 60+3 | 61+3 | 19+3 | https://cdn.sofifa.net/players/188/545/22_120.png | https://cdn.sofifa.net/teams/21/60.png | https://cdn.sofifa.net/flags/de.png | https://cdn.sofifa.net/teams/1353/60.png | https://cdn.sofifa.net/flags/pl.png |
2 | 20801 | https://sofifa.com/player/20801/c-ronaldo-dos-... | Cristiano Ronaldo | Cristiano Ronaldo dos Santos Aveiro | ST, LW | 91 | 91 | 45000000.0 | 270000.0 | 36 | ... | 53+3 | 53+3 | 53+3 | 60+3 | 20+3 | https://cdn.sofifa.net/players/020/801/22_120.png | https://cdn.sofifa.net/teams/11/60.png | https://cdn.sofifa.net/flags/gb-eng.png | https://cdn.sofifa.net/teams/1354/60.png | https://cdn.sofifa.net/flags/pt.png |
3 | 190871 | https://sofifa.com/player/190871/neymar-da-sil... | Neymar Jr | Neymar da Silva Santos Júnior | LW, CAM | 91 | 91 | 129000000.0 | 270000.0 | 29 | ... | 50+3 | 50+3 | 50+3 | 62+3 | 20+3 | https://cdn.sofifa.net/players/190/871/22_120.png | https://cdn.sofifa.net/teams/73/60.png | https://cdn.sofifa.net/flags/fr.png | NaN | https://cdn.sofifa.net/flags/br.png |
4 | 192985 | https://sofifa.com/player/192985/kevin-de-bruy... | K. De Bruyne | Kevin De Bruyne | CM, CAM | 91 | 91 | 125500000.0 | 350000.0 | 30 | ... | 69+3 | 69+3 | 69+3 | 75+3 | 21+3 | https://cdn.sofifa.net/players/192/985/22_120.png | https://cdn.sofifa.net/teams/10/60.png | https://cdn.sofifa.net/flags/gb-eng.png | https://cdn.sofifa.net/teams/1325/60.png | https://cdn.sofifa.net/flags/be.png |
5 rows × 110 columns
In [4]:
df.columns
Out[4]:
Index(['sofifa_id', 'player_url', 'short_name', 'long_name', 'player_positions', 'overall', 'potential', 'value_eur', 'wage_eur', 'age', ... 'lcb', 'cb', 'rcb', 'rb', 'gk', 'player_face_url', 'club_logo_url', 'club_flag_url', 'nation_logo_url', 'nation_flag_url'], dtype='object', length=110)
In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 19239 entries, 0 to 19238 Columns: 110 entries, sofifa_id to nation_flag_url dtypes: float64(16), int64(44), object(50) memory usage: 16.1+ MB
In [6]:
df.describe()
Out[6]:
sofifa_id | overall | potential | value_eur | wage_eur | age | height_cm | weight_kg | club_team_id | league_level | ... | mentality_composure | defending_marking_awareness | defending_standing_tackle | defending_sliding_tackle | goalkeeping_diving | goalkeeping_handling | goalkeeping_kicking | goalkeeping_positioning | goalkeeping_reflexes | goalkeeping_speed | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 19239.000000 | 19239.000000 | 19239.000000 | 1.916500e+04 | 19178.000000 | 19239.000000 | 19239.000000 | 19239.000000 | 19178.000000 | 19178.000000 | ... | 19239.000000 | 19239.000000 | 19239.000000 | 19239.000000 | 19239.000000 | 19239.000000 | 19239.000000 | 19239.000000 | 19239.000000 | 2132.000000 |
mean | 231468.086959 | 65.772182 | 71.079370 | 2.850452e+06 | 9017.989363 | 25.210822 | 181.299704 | 74.943032 | 50580.498123 | 1.354364 | ... | 57.929830 | 46.601746 | 48.045584 | 45.906700 | 16.406102 | 16.192474 | 16.055356 | 16.229274 | 16.491814 | 36.439962 |
std | 27039.717497 | 6.880232 | 6.086213 | 7.613700e+06 | 19470.176724 | 4.748235 | 6.863179 | 7.069434 | 54401.868535 | 0.747865 | ... | 12.159326 | 20.200807 | 21.232718 | 20.755683 | 17.574028 | 16.839528 | 16.564554 | 17.059779 | 17.884833 | 10.751563 |
min | 41.000000 | 47.000000 | 49.000000 | 9.000000e+03 | 500.000000 | 16.000000 | 155.000000 | 49.000000 | 1.000000 | 1.000000 | ... | 12.000000 | 4.000000 | 5.000000 | 5.000000 | 2.000000 | 2.000000 | 2.000000 | 2.000000 | 2.000000 | 15.000000 |
25% | 214413.500000 | 61.000000 | 67.000000 | 4.750000e+05 | 1000.000000 | 21.000000 | 176.000000 | 70.000000 | 479.000000 | 1.000000 | ... | 50.000000 | 29.000000 | 28.000000 | 25.000000 | 8.000000 | 8.000000 | 8.000000 | 8.000000 | 8.000000 | 27.000000 |
50% | 236543.000000 | 66.000000 | 71.000000 | 9.750000e+05 | 3000.000000 | 25.000000 | 181.000000 | 75.000000 | 1938.000000 | 1.000000 | ... | 59.000000 | 52.000000 | 56.000000 | 53.000000 | 11.000000 | 11.000000 | 11.000000 | 11.000000 | 11.000000 | 36.000000 |
75% | 253532.500000 | 70.000000 | 75.000000 | 2.000000e+06 | 8000.000000 | 29.000000 | 186.000000 | 80.000000 | 111139.000000 | 1.000000 | ... | 66.000000 | 63.000000 | 65.000000 | 63.000000 | 14.000000 | 14.000000 | 14.000000 | 14.000000 | 14.000000 | 45.000000 |
max | 264640.000000 | 93.000000 | 95.000000 | 1.940000e+08 | 350000.000000 | 54.000000 | 206.000000 | 110.000000 | 115820.000000 | 5.000000 | ... | 96.000000 | 93.000000 | 93.000000 | 92.000000 | 91.000000 | 92.000000 | 93.000000 | 92.000000 | 90.000000 | 65.000000 |
8 rows × 60 columns
Some visualizations to better understand the dataset¶
In [7]:
df['age'].plot(kind='hist', figsize=(10, 8))
Out[7]:
<Axes: ylabel='Frequency'>
In [8]:
df['overall'].plot(kind='box', vert=False, figsize=(10, 5))
Out[8]:
<Axes: >
In [9]:
sns.distplot(df['age']);
/tmp/ipykernel_17/1462740167.py:1: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(df['age']);
In [10]:
df['international_reputation'].value_counts().plot(kind='pie', autopct='%1.1f%%', cmap='Set3', figsize=(10, 8))
Out[10]:
<Axes: ylabel='count'>
In [11]:
df['nationality_name'].value_counts().head(10).plot(kind='pie', autopct='%1.1f%%', cmap='Set3', figsize=(10, 8))
Out[11]:
<Axes: ylabel='count'>
In [12]:
df['work_rate'].value_counts().plot(kind='pie', autopct='%1.1f%%', cmap='Set3', figsize=(10, 8))
Out[12]:
<Axes: ylabel='count'>
Filtering¶
Below we will do some basic as well as advance filtering on our FIFA Players 22 dataset.
1. Select all the rows and the first 5 columns from the dataframe and store the result in the variable df_first_five_cols
In [13]:
df_first_five_cols = df.iloc[:,:5]
In [ ]:
2. Select Specific Columns in a Custom Order
In [21]:
df_order_cols = df.iloc[:,[3,6,4,1]]
In [ ]:
3. Filter out 50 rows from the Colum number 3,6,4,1 into the variable fifty_rows
In [23]:
fifty_rows = df.iloc[:50,[3,6,4,1]]
In [ ]:
4. Select the all rows and the Columns ['short_name','age','player_positions', 'overall', 'value_eur']
from the dataframe and store your selection in the variable named_cols
In [25]:
named_cols= df[["short_name","age","player_positions","overall","value_eur"]]
In [ ]:
5. Filter out the first 14 rows and the Columns ['short_name','age','player_positions', 'overall', 'value_eur']
from the dataframe and store the result in the variable named_cols_rows
In [27]:
named_cols_rows = df.iloc[:14][['short_name',"age","player_positions","overall","value_eur"]]
In [ ]:
6. Find out how many players of overall
rating greater than 90 exists in the dataset?
In [36]:
df[df['overall'] > 90].shape
Out[36]:
(7, 110)
7. Find out how many players of value_eur
of less than or equal to 20000 in the dataset?
In [37]:
df[df['value_eur']<=20000].shape
Out[37]:
(10, 110)
8. Select the long_name
and age
of the players having the club_name='Manchester City'
and store your selection in the variable name long_name_age
In [43]:
long_name_age = df[df['club_name']=='Manchester City'][['long_name','age']]
In [44]:
long_name_age
Out[44]:
long_name | age | |
---|---|---|
4 | Kevin De Bruyne | 30 |
18 | Ederson Santana de Moraes | 27 |
27 | Raheem Sterling | 26 |
46 | Rúben dos Santos Gato Alves Dias | 24 |
58 | Riyad Mahrez | 30 |
62 | João Pedro Cavaco Cancelo | 27 |
63 | Aymeric Laporte | 27 |
65 | Bernardo Mota Veiga de Carvalho e Silva | 26 |
67 | Rodrigo Hernández Cascante | 25 |
78 | İlkay Gündoğan | 30 |
80 | Kyle Walker | 31 |
118 | Jack Grealish | 25 |
139 | Philip Foden | 21 |
140 | Fernando Luiz Rosa | 36 |
166 | John Stones | 27 |
191 | Gabriel Fernando de Jesus | 24 |
280 | Ferran Torres GarcÃa | 21 |
477 | Oleksandr Zinchenko | 24 |
569 | Benjamin Mendy | 26 |
745 | Nathan Aké | 26 |
1075 | Zack Thomas Steffen | 26 |
7487 | Scott Carson | 35 |
9771 | Kayky da Silva Chagas | 18 |
11543 | Luke Bolton | 21 |
12055 | Cole Palmer | 19 |
12095 | Liam Delap | 18 |
12113 | Jayden Jezairo Braaf | 18 |
14275 | Luke Mbete | 17 |
14349 | Romeo Lavia | 17 |
14364 | Samuel Edozie | 18 |
15157 | James McAtee | 18 |
17084 | Iker Pozo La Rosa | 20 |
9. Filter out the players of Liverpool
club and who are from Brazil
and store them in the variable liverpool_brazil
Note: club_name
& nationality_name
are the column names
In [49]:
liverpool_brazil = df[(df['club_name'] == 'Liverpool') & (df['nationality_name'] == 'Brazil')]
In [50]:
liverpool_brazil
Out[50]:
sofifa_id | player_url | short_name | long_name | player_positions | overall | potential | value_eur | wage_eur | age | ... | lcb | cb | rcb | rb | gk | player_face_url | club_logo_url | club_flag_url | nation_logo_url | nation_flag_url | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20 | 212831 | https://sofifa.com/player/212831/alisson-ramse... | Alisson | Alisson Ramsés Becker | GK | 89 | 90 | 82000000.0 | 190000.0 | 28 | ... | 31+3 | 31+3 | 31+3 | 30+3 | 87+3 | https://cdn.sofifa.net/players/212/831/22_120.png | https://cdn.sofifa.net/teams/9/60.png | https://cdn.sofifa.net/flags/gb-eng.png | NaN | https://cdn.sofifa.net/flags/br.png |
61 | 209499 | https://sofifa.com/player/209499/fabio-henriqu... | Fabinho | Fábio Henrique Tavares | CDM, CB | 86 | 88 | 73500000.0 | 165000.0 | 27 | ... | 84+3 | 84+3 | 84+3 | 82+3 | 18+3 | https://cdn.sofifa.net/players/209/499/22_120.png | https://cdn.sofifa.net/teams/9/60.png | https://cdn.sofifa.net/flags/gb-eng.png | NaN | https://cdn.sofifa.net/flags/br.png |
85 | 201942 | https://sofifa.com/player/201942/roberto-firmi... | Roberto Firmino | Roberto Firmino Barbosa de Oliveira | CF | 85 | 85 | 54000000.0 | 185000.0 | 29 | ... | 67+3 | 67+3 | 67+3 | 69+3 | 17+3 | https://cdn.sofifa.net/players/201/942/22_120.png | https://cdn.sofifa.net/teams/9/60.png | https://cdn.sofifa.net/flags/gb-eng.png | NaN | https://cdn.sofifa.net/flags/br.png |
17275 | 259430 | https://sofifa.com/player/259430/marcelo-pital... | Marcelo Pitaluga | Marcelo Pitaluga | GK | 57 | 78 | 475000.0 | 1000.0 | 18 | ... | 21+2 | 21+2 | 21+2 | 18+2 | 56+2 | https://cdn.sofifa.net/players/259/430/22_120.png | https://cdn.sofifa.net/teams/9/60.png | https://cdn.sofifa.net/flags/gb-eng.png | NaN | https://cdn.sofifa.net/flags/br.png |
4 rows × 110 columns
10. How many players are there who have either mentality_aggression > 91
or power_stamina < 80
Note: Using query method will be easy
In [53]:
df[(df['mentality_aggression']>91) | (df['power_stamina']<80)].shape
Out[53]:
(16919, 110)
11. Select all players from France who have either mentality_aggression > 91
or power_stamina < 80
and store the filtered data in the variable france_player
In [54]:
france_data = df[df['nationality_name']=='France']
In [55]:
france_player = france_data[(france_data['mentality_aggression']>91) | (france_data['power_stamina']<80)]
In [ ]:
12. Identify Left-Footed Players and Display Their Details
In [61]:
left_foot_players = df[df['preferred_foot']=='Left'][['short_name','age','preferred_foot']]
In [ ]: