Statement of Completion#82859019
Data Wrangling with Pandas
medium
Player Stats: Wrangling with Football Data
Resolution
Activities
In [1]:
import pandas as pd
import numpy as np
import warnings
In [2]:
warnings.filterwarnings('ignore')
In [3]:
df = pd.read_csv('football_players_data.csv')
In [4]:
df.head(1)
Out[4]:
Name | Position | Club | Division | Based | Nation | Height | Weight | Age | Preferred Foot | ... | Wage | Current Ability | Potential Ability | Vision | Stamina | Pace | Determination | Concentration | Balance | Acceleration | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lionel Messi | AM (RC), ST (C) | Barcelona | Spanish First Division | Spain (First Division) | ARG | 170 | 72 | 32 | Left | ... | 1200000 | 195 | 200 | 20 | 13 | 15 | 20 | 13 | 19 | 18 |
1 rows × 23 columns
In [ ]:
df.info()
Basic GroupBy Activities¶
1) Extract Players from Argentine and Store in DataFrame named players_arg
¶
In [5]:
players_arg = df[df['Nation']=='ARG'] # Enter your code here
2) Extract Players with Preferred Foot as Either
and Store in players_preferred_foot_either
¶
In [7]:
players_preferred_foot_either = df[df['Preferred Foot']=='Either'] # Enter your code here
3) Identify the Club with the Maximum Number of Players with Preferred Foot as Either
¶
In [15]:
players_preferred_foot_either.groupby('Club').size().idxmax()
Out[15]:
'Al-Hilal (KSA)'
4) Compute Average Age of Players per Club¶
In [16]:
avg_age_per_club = df.groupby('Club')[['Age']].mean()
avg_age_per_club
Out[16]:
Age | |
---|---|
Club | |
1. FC Heidenheim 1846 | 26.000000 |
1. FC Köln | 23.692308 |
1. FC Köln II | 33.000000 |
1. FC Nürnberg | 25.272727 |
1860 Munich | 25.500000 |
... | ... |
Östersunds FK | 26.000000 |
Újpest | 25.000000 |
Ümraniyespor | 27.750000 |
İstanbulspor | 25.750000 |
Śląsk Wrocław | 24.857143 |
1369 rows × 1 columns
5) Calculate Total Value of Players per Division¶
In [18]:
total_value_per_division = df.groupby('Division')[['Value']].sum()
total_value_per_division
Out[18]:
Value | |
---|---|
Division | |
3. Liga | 64650000 |
3F Superliga | 144725000 |
Albanian Superleague | 1900000 |
Angolan Championship | 375000 |
Argentine Metropolitan B Zone | 33975000 |
... | ... |
United States Soccer League Division Two | 9975000 |
Unknown | 89350000 |
Uruguayan First Division | 41650000 |
Uruguayan Second Professional Division | 3625000 |
Vanarama National League North | 400000 |
193 rows × 1 columns
6) Find Division with Players of Highest Value¶
In [20]:
total_value_per_division.idxmax()
Out[20]:
Value English Premier Division dtype: object
7)Count Players by Preferred Foot and Store in preferred_foot_counts
¶
In [36]:
preferred_foot_counts = df.groupby('Preferred Foot').size().reset_index(name='counts')
preferred_foot_counts
Out[36]:
Preferred Foot | counts | |
---|---|---|
0 | Either | 482 |
1 | Left | 2423 |
2 | Left Only | 927 |
3 | Right | 8389 |
4 | Right Only | 1766 |
8) Find Out Which Foot is Preferred by the Majority of Players¶
In [39]:
preferred_foot_counts.loc[preferred_foot_counts['counts'].idxmax()]
Out[39]:
Preferred Foot Right counts 8389 Name: 3, dtype: object
9) Determine Maximum Wage by Nation¶
In [41]:
max_wage_per_nation = df.groupby('Nation')[['Wage']].max()
max_wage_per_nation
Out[41]:
Wage | |
---|---|
Nation | |
ALB | 49000 |
ALG | 160000 |
ANG | 30500 |
ARG | 1200000 |
ARM | 180000 |
... | ... |
VEN | 45500 |
WAL | 525000 |
YEM | 6000 |
ZAM | 31000 |
ZIM | 41500 |
161 rows × 1 columns
10) What is the Country with the Highest Wage Offering?¶
In [43]:
max_wage_per_nation.idxmax()
Out[43]:
Wage ARG dtype: object
11) Calculate the average height and weight for each nation and store the results in a dataframe named avg_height_weight_per_nation
¶
In [45]:
avg_height_weight_per_nation = df.groupby('Nation')[['Height', 'Weight']].mean()
avg_height_weight_per_nation
Out[45]:
Height | Weight | |
---|---|---|
Nation | ||
ALB | 181.897436 | 74.897436 |
ALG | 181.678571 | 76.107143 |
ANG | 178.411765 | 73.235294 |
ARG | 178.717472 | 74.137015 |
ARM | 181.500000 | 74.166667 |
... | ... | ... |
VEN | 180.297872 | 74.255319 |
WAL | 180.870968 | 73.870968 |
YEM | 178.000000 | 69.000000 |
ZAM | 179.750000 | 75.750000 |
ZIM | 180.777778 | 70.777778 |
161 rows × 2 columns
12) Determine the nation with the lowest average height and the nation with the highest average weight among players¶
In [49]:
print('Lowest Height', avg_height_weight_per_nation['Height'].idxmin())
print('Heighest Weight', avg_height_weight_per_nation['Weight'].idxmax())
Lowest Height LES Heighest Weight GUF
13) Calculate the total stamina for each club and store the result in total_stamina_per_club
¶
In [51]:
total_stamina_per_club = df.groupby('Club')[['Stamina']].sum()
total_stamina_per_club
Out[51]:
Stamina | |
---|---|
Club | |
1. FC Heidenheim 1846 | 170 |
1. FC Köln | 352 |
1. FC Köln II | 12 |
1. FC Nürnberg | 280 |
1860 Munich | 25 |
... | ... |
Östersunds FK | 13 |
Újpest | 12 |
Ümraniyespor | 46 |
İstanbulspor | 49 |
Śląsk Wrocław | 87 |
1369 rows × 1 columns
14) Which Club's Players Exhibit Remarkable Stamina Levels?¶
In [53]:
total_stamina_per_club.idxmax()
Out[53]:
Stamina Vélez dtype: object
15) Who is the player with most stamina in Club Barcelona
¶
In [55]:
df.head(1)
Out[55]:
Name | Position | Club | Division | Based | Nation | Height | Weight | Age | Preferred Foot | ... | Wage | Current Ability | Potential Ability | Vision | Stamina | Pace | Determination | Concentration | Balance | Acceleration | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lionel Messi | AM (RC), ST (C) | Barcelona | Spanish First Division | Spain (First Division) | ARG | 170 | 72 | 32 | Left | ... | 1200000 | 195 | 200 | 20 | 13 | 15 | 20 | 13 | 19 | 18 |
1 rows × 23 columns
In [67]:
club_barcelona = df[df['Club']=='Barcelona']
staminas = club_barcelona.groupby('Name')[['Stamina']].sum()
staminas.idxmax()
Out[67]:
Stamina Sergi Roberto dtype: object
16) What is the best role of Sergi Roberto
?¶
In [72]:
segi_best_role = df[df['Name']=='Sergi Roberto']['Best Role'].iloc[0]
segi_best_role
Out[72]:
'BBM'
17) Calculate the average vision score for each position and Store the result in avg_vision_per_position
¶
In [73]:
avg_vision_per_position = df.groupby('Position')[['Vision']].mean()
18) What is the average vision score for players in the AM (C)
position?¶
In [76]:
avg_vision_per_position.loc['AM (C)'].iloc[0]
Out[76]:
12.972972972972974
19) Who is the top player by Potential Ability in the Spanish First Division
?¶
In [95]:
players_spanish = df[df['Division'] == 'Spanish First Division']
players_spanish.loc[players_spanish['Potential Ability'].idxmax()].iloc[0]
Out[95]:
'Lionel Messi'
20) Calculate the average market value of players within each club and position combination¶
In [106]:
avg_market_value_init = df.groupby(['Club','Position'])['Value'].mean()
avg_market_value = avg_market_value_init.reset_index()
avg_market_value.columns = ['Club', 'Position', 'Value']
avg_market_value
Out[106]:
Club | Position | Value | |
---|---|---|---|
0 | 1. FC Heidenheim 1846 | D (C) | 525000.0 |
1 | 1. FC Heidenheim 1846 | D (L), M/AM (LC) | 925000.0 |
2 | 1. FC Heidenheim 1846 | D (LC), DM, M (C) | 825000.0 |
3 | 1. FC Heidenheim 1846 | D/WB/M (L) | 525000.0 |
4 | 1. FC Heidenheim 1846 | D/WB/M/AM (R) | 700000.0 |
... | ... | ... | ... |
10658 | Śląsk Wrocław | D (RLC) | 500000.0 |
10659 | Śląsk Wrocław | D/WB/M/AM (L) | 475000.0 |
10660 | Śląsk Wrocław | DM, M (C) | 475000.0 |
10661 | Śląsk Wrocław | M (RL), AM (RLC), ST (C) | 375000.0 |
10662 | Śląsk Wrocław | ST (C) | 800000.0 |
10663 rows × 3 columns
21) Count the number of players within each nation and preferred foot combination¶
In [111]:
player_counts_nation = df.groupby(['Nation', 'Preferred Foot']).size()
player_counts_nation_pf = player_counts_nation.reset_index()
player_counts_nation_pf.columns=['Nation', 'Preferred Foot', 'Player Count']
player_counts_nation_pf
Out[111]:
Nation | Preferred Foot | Player Count | |
---|---|---|---|
0 | ALB | Either | 2 |
1 | ALB | Left | 6 |
2 | ALB | Right | 26 |
3 | ALB | Right Only | 5 |
4 | ALG | Left | 11 |
... | ... | ... | ... |
571 | ZAM | Right Only | 2 |
572 | ZIM | Left | 2 |
573 | ZIM | Left Only | 1 |
574 | ZIM | Right | 5 |
575 | ZIM | Right Only | 1 |
576 rows × 3 columns
Activities using Aggregation¶
22) Calculate the mean value and maximum current ability of players within each club¶
In [113]:
club_aggregations = df.groupby('Club').agg({
'Value':'mean',
'Current Ability':'max'
})
club_aggregations
Out[113]:
Value | Current Ability | |
---|---|---|
Club | ||
1. FC Heidenheim 1846 | 1.003846e+06 | 125 |
1. FC Köln | 5.192308e+06 | 142 |
1. FC Köln II | 8.250000e+05 | 118 |
1. FC Nürnberg | 2.385227e+06 | 129 |
1860 Munich | 4.875000e+05 | 114 |
... | ... | ... |
Östersunds FK | 4.250000e+05 | 108 |
Újpest | 4.500000e+05 | 113 |
Ümraniyespor | 5.562500e+05 | 115 |
İstanbulspor | 6.812500e+05 | 110 |
Śląsk Wrocław | 5.678571e+05 | 115 |
1369 rows × 2 columns
23) Calculate Age Range by Nation¶
In [126]:
def age_range(x):
return x.max() - x.min()
age_range_per_nation = df.groupby('Nation')['Age'].agg({age_range}).reset_index()
age_range_per_nation.columns = ['Nation', 'Age Range']
age_range_per_nation = age_range_per_nation.set_index('Nation')
age_range_per_nation
Out[126]:
Age Range | |
---|---|
Nation | |
ALB | 14 |
ALG | 12 |
ANG | 12 |
ARG | 22 |
ARM | 7 |
... | ... |
VEN | 15 |
WAL | 14 |
YEM | 0 |
ZAM | 11 |
ZIM | 10 |
161 rows × 1 columns
24) What is the average wage of players in the 3. Liga
, the mean stamina of players in the 3F Superliga
, and the standard deviation of player stamina in the Albanian Superleague
?¶
Enter the three values in the format: answer1, answer2, answer3
.
In [131]:
liga3_players = df[df['Division'] == '3. Liga']
print('Average wage of players in 3. Liga: ', liga3_players['Wage'].sum())
superliga3F_players = df[df['Division'] == '3F Superliga']
print('Mean stamina of players in the 3F Superliga: ', superliga3F_players['Stamina'].mean())
albanian_superleague_players = df[df['Division'] == 'Albanian Superleague']
print('Standard deviation of player stamina in the Albanian Superleague: ', albanian_superleague_players['Stamina'].std())
Average wage of players in 3. Liga: 542400 Mean stamina of players in the 3F Superliga: 12.422619047619047 Standard deviation of player stamina in the Albanian Superleague: 3.055050463303893
25) Calculate Average Value and Current Ability Variance by Club¶
In [138]:
def variance(x):
return x.var()
club_statistics = df.groupby('Club').agg({
'Value':'mean',
'Current Ability':variance
})
club_statistics
Out[138]:
Value | Current Ability | |
---|---|---|
Club | ||
1. FC Heidenheim 1846 | 1.003846e+06 | 19.435897 |
1. FC Köln | 5.192308e+06 | 99.835385 |
1. FC Köln II | 8.250000e+05 | NaN |
1. FC Nürnberg | 2.385227e+06 | 28.112554 |
1860 Munich | 4.875000e+05 | 0.500000 |
... | ... | ... |
Östersunds FK | 4.250000e+05 | NaN |
Újpest | 4.500000e+05 | NaN |
Ümraniyespor | 5.562500e+05 | 16.666667 |
İstanbulspor | 6.812500e+05 | 6.250000 |
Śląsk Wrocław | 5.678571e+05 | 8.000000 |
1369 rows × 2 columns
Activities using Apply¶
26) Classify Players Based on Ability¶
In [144]:
def player_type(player):
if player['Current Ability'] > 180 and player['Potential Ability']>190:
return 'Star'
else:
return 'Regular'
df['Player Type'] = df.apply(player_type, axis=1)
df
Out[144]:
Name | Position | Club | Division | Based | Nation | Height | Weight | Age | Preferred Foot | ... | Current Ability | Potential Ability | Vision | Stamina | Pace | Determination | Concentration | Balance | Acceleration | Player Type | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lionel Messi | AM (RC), ST (C) | Barcelona | Spanish First Division | Spain (First Division) | ARG | 170 | 72 | 32 | Left | ... | 195 | 200 | 20 | 13 | 15 | 20 | 13 | 19 | 18 | Star |
1 | Cristiano Ronaldo | AM (RL), ST (C) | Juventus | Italian Serie A | Italy (Serie A) | POR | 185 | 83 | 34 | Either | ... | 195 | 196 | 14 | 15 | 16 | 20 | 16 | 14 | 15 | Star |
2 | Kylian Mbappé | AM (RL), ST (C) | Paris SG | Ligue 1 Conforama | France (Ligue 1 Conforama) | FRA | 178 | 73 | 20 | Right | ... | 181 | 196 | 13 | 14 | 20 | 14 | 13 | 14 | 20 | Star |
3 | Manuel Neuer | GK | FC Bayern | Bundesliga | Germany (Bundesliga) | GER | 192 | 90 | 33 | Either | ... | 176 | 193 | 12 | 10 | 10 | 18 | 15 | 14 | 13 | Regular |
4 | Neymar | M (L), AM (LC), ST (C) | Paris SG | Ligue 1 Conforama | France (Ligue 1 Conforama) | BRA | 175 | 68 | 27 | Right | ... | 186 | 190 | 16 | 13 | 16 | 16 | 10 | 14 | 17 | Regular |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
13982 | Mirza'ekber | AM (C), ST (C) | XJ TS Leopard | Chinese National First Division | China (First Division) | CHN | 186 | 82 | 24 | Right Only | ... | 56 | 63 | 6 | 12 | 13 | 9 | 7 | 6 | 13 | Regular |
13983 | Miao Ming | D (C) | Meizhou | Chinese National First Division | China (First Division) | CHN | 185 | 73 | 29 | Right | ... | 59 | 62 | 6 | 9 | 8 | 4 | 9 | 12 | 7 | Regular |
13984 | Yu Jiawei | D (LC), M (L) | SY Urban | Chinese National Second Division Northern Group | China (Second Division North) | CHN | 183 | 76 | 25 | Left Only | ... | 58 | 62 | 5 | 10 | 6 | 6 | 8 | 12 | 7 | Regular |
13985 | Liu Xinyu | ST (C) | Renhe | Chinese Super League | China (Super League) | CHN | 192 | 75 | 19 | Right Only | ... | 50 | 61 | 1 | 6 | 10 | 12 | 1 | 4 | 13 | Regular |
13986 | Han Tianlin | M (C) | Changchun | Chinese National First Division | China (First Division) | CHN | 182 | 72 | 22 | Right Only | ... | 46 | 60 | 12 | 12 | 12 | 9 | 4 | 4 | 12 | Regular |
13987 rows × 24 columns
27) Categorize Players by Market Value¶
In [150]:
def categorize_by_value(value):
if value['Value'] > 50000000:
return 'High'
elif 20000000 < value['Value'] <= 50000000:
return 'Medium'
else:
return 'Low'
df['Value Type'] = df.apply(categorize_by_value, axis=1)
df
Out[150]:
Name | Position | Club | Division | Based | Nation | Height | Weight | Age | Preferred Foot | ... | Potential Ability | Vision | Stamina | Pace | Determination | Concentration | Balance | Acceleration | Player Type | Value Type | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lionel Messi | AM (RC), ST (C) | Barcelona | Spanish First Division | Spain (First Division) | ARG | 170 | 72 | 32 | Left | ... | 200 | 20 | 13 | 15 | 20 | 13 | 19 | 18 | Star | High |
1 | Cristiano Ronaldo | AM (RL), ST (C) | Juventus | Italian Serie A | Italy (Serie A) | POR | 185 | 83 | 34 | Either | ... | 196 | 14 | 15 | 16 | 20 | 16 | 14 | 15 | Star | Medium |
2 | Kylian Mbappé | AM (RL), ST (C) | Paris SG | Ligue 1 Conforama | France (Ligue 1 Conforama) | FRA | 178 | 73 | 20 | Right | ... | 196 | 13 | 14 | 20 | 14 | 13 | 14 | 20 | Star | High |
3 | Manuel Neuer | GK | FC Bayern | Bundesliga | Germany (Bundesliga) | GER | 192 | 90 | 33 | Either | ... | 193 | 12 | 10 | 10 | 18 | 15 | 14 | 13 | Regular | Medium |
4 | Neymar | M (L), AM (LC), ST (C) | Paris SG | Ligue 1 Conforama | France (Ligue 1 Conforama) | BRA | 175 | 68 | 27 | Right | ... | 190 | 16 | 13 | 16 | 16 | 10 | 14 | 17 | Regular | High |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
13982 | Mirza'ekber | AM (C), ST (C) | XJ TS Leopard | Chinese National First Division | China (First Division) | CHN | 186 | 82 | 24 | Right Only | ... | 63 | 6 | 12 | 13 | 9 | 7 | 6 | 13 | Regular | Low |
13983 | Miao Ming | D (C) | Meizhou | Chinese National First Division | China (First Division) | CHN | 185 | 73 | 29 | Right | ... | 62 | 6 | 9 | 8 | 4 | 9 | 12 | 7 | Regular | Low |
13984 | Yu Jiawei | D (LC), M (L) | SY Urban | Chinese National Second Division Northern Group | China (Second Division North) | CHN | 183 | 76 | 25 | Left Only | ... | 62 | 5 | 10 | 6 | 6 | 8 | 12 | 7 | Regular | Low |
13985 | Liu Xinyu | ST (C) | Renhe | Chinese Super League | China (Super League) | CHN | 192 | 75 | 19 | Right Only | ... | 61 | 1 | 6 | 10 | 12 | 1 | 4 | 13 | Regular | Low |
13986 | Han Tianlin | M (C) | Changchun | Chinese National First Division | China (First Division) | CHN | 182 | 72 | 22 | Right Only | ... | 60 | 12 | 12 | 12 | 9 | 4 | 4 | 12 | Regular | Low |
13987 rows × 25 columns
28) Categorize Players by Age¶
In [152]:
def categorize_by_age(age):
if age['Age'] < 25:
return 'Young'
elif 25 <= age['Age'] <=29:
return 'Mid-age'
else:
return 'Senior'
df['Age Group'] = df.apply(categorize_by_age, axis=1)
29) Who are the top 2 most valuable players in 1. FC Heidenheim 1846
?¶
In [164]:
fc_diviision = df[df['Club'] == '1. FC Heidenheim 1846']
fc_diviision.sort_values(by='Value', ascending=False)['Name'].head(2)
Out[164]:
5690 Tim Kleindienst 2851 Niklas Dorsch Name: Name, dtype: object
30) Calculate BMI for Each Player¶
In [168]:
def calculate_bmi(player):
playerHeight_meter = player['Height']/100
return player['Weight'] / (playerHeight_meter**2)
df['BMI'] = df.apply(calculate_bmi, axis = 1)
df
Out[168]:
Name | Position | Club | Division | Based | Nation | Height | Weight | Age | Preferred Foot | ... | Stamina | Pace | Determination | Concentration | Balance | Acceleration | Player Type | Value Type | Age Group | BMI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lionel Messi | AM (RC), ST (C) | Barcelona | Spanish First Division | Spain (First Division) | ARG | 170 | 72 | 32 | Left | ... | 13 | 15 | 20 | 13 | 19 | 18 | Star | High | Senior | 24.913495 |
1 | Cristiano Ronaldo | AM (RL), ST (C) | Juventus | Italian Serie A | Italy (Serie A) | POR | 185 | 83 | 34 | Either | ... | 15 | 16 | 20 | 16 | 14 | 15 | Star | Medium | Senior | 24.251278 |
2 | Kylian Mbappé | AM (RL), ST (C) | Paris SG | Ligue 1 Conforama | France (Ligue 1 Conforama) | FRA | 178 | 73 | 20 | Right | ... | 14 | 20 | 14 | 13 | 14 | 20 | Star | High | Young | 23.040020 |
3 | Manuel Neuer | GK | FC Bayern | Bundesliga | Germany (Bundesliga) | GER | 192 | 90 | 33 | Either | ... | 10 | 10 | 18 | 15 | 14 | 13 | Regular | Medium | Senior | 24.414062 |
4 | Neymar | M (L), AM (LC), ST (C) | Paris SG | Ligue 1 Conforama | France (Ligue 1 Conforama) | BRA | 175 | 68 | 27 | Right | ... | 13 | 16 | 16 | 10 | 14 | 17 | Regular | High | Mid-age | 22.204082 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
13982 | Mirza'ekber | AM (C), ST (C) | XJ TS Leopard | Chinese National First Division | China (First Division) | CHN | 186 | 82 | 24 | Right Only | ... | 12 | 13 | 9 | 7 | 6 | 13 | Regular | Low | Young | 23.702162 |
13983 | Miao Ming | D (C) | Meizhou | Chinese National First Division | China (First Division) | CHN | 185 | 73 | 29 | Right | ... | 9 | 8 | 4 | 9 | 12 | 7 | Regular | Low | Mid-age | 21.329438 |
13984 | Yu Jiawei | D (LC), M (L) | SY Urban | Chinese National Second Division Northern Group | China (Second Division North) | CHN | 183 | 76 | 25 | Left Only | ... | 10 | 6 | 6 | 8 | 12 | 7 | Regular | Low | Mid-age | 22.694019 |
13985 | Liu Xinyu | ST (C) | Renhe | Chinese Super League | China (Super League) | CHN | 192 | 75 | 19 | Right Only | ... | 6 | 10 | 12 | 1 | 4 | 13 | Regular | Low | Young | 20.345052 |
13986 | Han Tianlin | M (C) | Changchun | Chinese National First Division | China (First Division) | CHN | 182 | 72 | 22 | Right Only | ... | 12 | 12 | 9 | 4 | 4 | 12 | Regular | Low | Young | 21.736505 |
13987 rows × 27 columns
Activities Using Transform¶
31) Rank Players by Market Value Within Each Nation¶
In [171]:
df['Value Rank'] = df.groupby('Nation')['Value'].transform(lambda x: x.rank(ascending=False).astype(int))
df
Out[171]:
Name | Position | Club | Division | Based | Nation | Height | Weight | Age | Preferred Foot | ... | Pace | Determination | Concentration | Balance | Acceleration | Player Type | Value Type | Age Group | BMI | Value Rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lionel Messi | AM (RC), ST (C) | Barcelona | Spanish First Division | Spain (First Division) | ARG | 170 | 72 | 32 | Left | ... | 15 | 20 | 13 | 19 | 18 | Star | High | Senior | 24.913495 | 2 |
1 | Cristiano Ronaldo | AM (RL), ST (C) | Juventus | Italian Serie A | Italy (Serie A) | POR | 185 | 83 | 34 | Either | ... | 16 | 20 | 16 | 14 | 15 | Star | Medium | Senior | 24.251278 | 12 |
2 | Kylian Mbappé | AM (RL), ST (C) | Paris SG | Ligue 1 Conforama | France (Ligue 1 Conforama) | FRA | 178 | 73 | 20 | Right | ... | 20 | 14 | 13 | 14 | 20 | Star | High | Young | 23.040020 | 1 |
3 | Manuel Neuer | GK | FC Bayern | Bundesliga | Germany (Bundesliga) | GER | 192 | 90 | 33 | Either | ... | 10 | 18 | 15 | 14 | 13 | Regular | Medium | Senior | 24.414062 | 12 |
4 | Neymar | M (L), AM (LC), ST (C) | Paris SG | Ligue 1 Conforama | France (Ligue 1 Conforama) | BRA | 175 | 68 | 27 | Right | ... | 16 | 16 | 10 | 14 | 17 | Regular | High | Mid-age | 22.204082 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
13982 | Mirza'ekber | AM (C), ST (C) | XJ TS Leopard | Chinese National First Division | China (First Division) | CHN | 186 | 82 | 24 | Right Only | ... | 13 | 9 | 7 | 6 | 13 | Regular | Low | Young | 23.702162 | 624 |
13983 | Miao Ming | D (C) | Meizhou | Chinese National First Division | China (First Division) | CHN | 185 | 73 | 29 | Right | ... | 8 | 4 | 9 | 12 | 7 | Regular | Low | Mid-age | 21.329438 | 624 |
13984 | Yu Jiawei | D (LC), M (L) | SY Urban | Chinese National Second Division Northern Group | China (Second Division North) | CHN | 183 | 76 | 25 | Left Only | ... | 6 | 6 | 8 | 12 | 7 | Regular | Low | Mid-age | 22.694019 | 477 |
13985 | Liu Xinyu | ST (C) | Renhe | Chinese Super League | China (Super League) | CHN | 192 | 75 | 19 | Right Only | ... | 10 | 12 | 1 | 4 | 13 | Regular | Low | Young | 20.345052 | 450 |
13986 | Han Tianlin | M (C) | Changchun | Chinese National First Division | China (First Division) | CHN | 182 | 72 | 22 | Right Only | ... | 12 | 9 | 4 | 4 | 12 | Regular | Low | Young | 21.736505 | 624 |
13987 rows × 28 columns
32) Standardize Vision Ratings Within Each Club¶
In [175]:
def standardize(x):
return (x - x.mean()) / x.std()
df['Standardized Vision'] = df.groupby('Club')['Vision'].transform(standardize)
df
Out[175]:
Name | Position | Club | Division | Based | Nation | Height | Weight | Age | Preferred Foot | ... | Determination | Concentration | Balance | Acceleration | Player Type | Value Type | Age Group | BMI | Value Rank | Standardized Vision | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lionel Messi | AM (RC), ST (C) | Barcelona | Spanish First Division | Spain (First Division) | ARG | 170 | 72 | 32 | Left | ... | 20 | 13 | 19 | 18 | Star | High | Senior | 24.913495 | 2 | 2.212736 |
1 | Cristiano Ronaldo | AM (RL), ST (C) | Juventus | Italian Serie A | Italy (Serie A) | POR | 185 | 83 | 34 | Either | ... | 20 | 16 | 14 | 15 | Star | Medium | Senior | 24.251278 | 12 | 0.699648 |
2 | Kylian Mbappé | AM (RL), ST (C) | Paris SG | Ligue 1 Conforama | France (Ligue 1 Conforama) | FRA | 178 | 73 | 20 | Right | ... | 14 | 13 | 14 | 20 | Star | High | Young | 23.040020 | 1 | 0.181786 |
3 | Manuel Neuer | GK | FC Bayern | Bundesliga | Germany (Bundesliga) | GER | 192 | 90 | 33 | Either | ... | 18 | 15 | 14 | 13 | Regular | Medium | Senior | 24.414062 | 12 | -0.425385 |
4 | Neymar | M (L), AM (LC), ST (C) | Paris SG | Ligue 1 Conforama | France (Ligue 1 Conforama) | BRA | 175 | 68 | 27 | Right | ... | 16 | 10 | 14 | 17 | Regular | High | Mid-age | 22.204082 | 1 | 1.272499 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
13982 | Mirza'ekber | AM (C), ST (C) | XJ TS Leopard | Chinese National First Division | China (First Division) | CHN | 186 | 82 | 24 | Right Only | ... | 9 | 7 | 6 | 13 | Regular | Low | Young | 23.702162 | 624 | -1.024695 |
13983 | Miao Ming | D (C) | Meizhou | Chinese National First Division | China (First Division) | CHN | 185 | 73 | 29 | Right | ... | 4 | 9 | 12 | 7 | Regular | Low | Mid-age | 21.329438 | 624 | -0.898303 |
13984 | Yu Jiawei | D (LC), M (L) | SY Urban | Chinese National Second Division Northern Group | China (Second Division North) | CHN | 183 | 76 | 25 | Left Only | ... | 6 | 8 | 12 | 7 | Regular | Low | Mid-age | 22.694019 | 477 | -1.494818 |
13985 | Liu Xinyu | ST (C) | Renhe | Chinese Super League | China (Super League) | CHN | 192 | 75 | 19 | Right Only | ... | 12 | 1 | 4 | 13 | Regular | Low | Young | 20.345052 | 450 | -2.336427 |
13986 | Han Tianlin | M (C) | Changchun | Chinese National First Division | China (First Division) | CHN | 182 | 72 | 22 | Right Only | ... | 9 | 4 | 4 | 12 | Regular | Low | Young | 21.736505 | 624 | 1.513250 |
13987 rows × 29 columns
33) Calculate Age Percentiles Within Each Club¶
In [178]:
def calculate_percentile(x):
return x.rank(pct=True)
df['Age Percentile'] = df.groupby('Club')['Age'].transform(calculate_percentile)
df
Out[178]:
Name | Position | Club | Division | Based | Nation | Height | Weight | Age | Preferred Foot | ... | Concentration | Balance | Acceleration | Player Type | Value Type | Age Group | BMI | Value Rank | Standardized Vision | Age Percentile | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Lionel Messi | AM (RC), ST (C) | Barcelona | Spanish First Division | Spain (First Division) | ARG | 170 | 72 | 32 | Left | ... | 13 | 19 | 18 | Star | High | Senior | 24.913495 | 2 | 2.212736 | 0.982759 |
1 | Cristiano Ronaldo | AM (RL), ST (C) | Juventus | Italian Serie A | Italy (Serie A) | POR | 185 | 83 | 34 | Either | ... | 16 | 14 | 15 | Star | Medium | Senior | 24.251278 | 12 | 0.699648 | 0.948276 |
2 | Kylian Mbappé | AM (RL), ST (C) | Paris SG | Ligue 1 Conforama | France (Ligue 1 Conforama) | FRA | 178 | 73 | 20 | Right | ... | 13 | 14 | 20 | Star | High | Young | 23.040020 | 1 | 0.181786 | 0.104167 |
3 | Manuel Neuer | GK | FC Bayern | Bundesliga | Germany (Bundesliga) | GER | 192 | 90 | 33 | Either | ... | 15 | 14 | 13 | Regular | Medium | Senior | 24.414062 | 12 | -0.425385 | 1.000000 |
4 | Neymar | M (L), AM (LC), ST (C) | Paris SG | Ligue 1 Conforama | France (Ligue 1 Conforama) | BRA | 175 | 68 | 27 | Right | ... | 10 | 14 | 17 | Regular | High | Mid-age | 22.204082 | 1 | 1.272499 | 0.708333 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
13982 | Mirza'ekber | AM (C), ST (C) | XJ TS Leopard | Chinese National First Division | China (First Division) | CHN | 186 | 82 | 24 | Right Only | ... | 7 | 6 | 13 | Regular | Low | Young | 23.702162 | 624 | -1.024695 | 0.500000 |
13983 | Miao Ming | D (C) | Meizhou | Chinese National First Division | China (First Division) | CHN | 185 | 73 | 29 | Right | ... | 9 | 12 | 7 | Regular | Low | Mid-age | 21.329438 | 624 | -0.898303 | 0.825000 |
13984 | Yu Jiawei | D (LC), M (L) | SY Urban | Chinese National Second Division Northern Group | China (Second Division North) | CHN | 183 | 76 | 25 | Left Only | ... | 8 | 12 | 7 | Regular | Low | Mid-age | 22.694019 | 477 | -1.494818 | 0.166667 |
13985 | Liu Xinyu | ST (C) | Renhe | Chinese Super League | China (Super League) | CHN | 192 | 75 | 19 | Right Only | ... | 1 | 4 | 13 | Regular | Low | Young | 20.345052 | 450 | -2.336427 | 0.055556 |
13986 | Han Tianlin | M (C) | Changchun | Chinese National First Division | China (First Division) | CHN | 182 | 72 | 22 | Right Only | ... | 4 | 4 | 12 | Regular | Low | Young | 21.736505 | 624 | 1.513250 | 0.250000 |
13987 rows × 30 columns
34) Calculate Deviation from Mean Pace Within Each Club¶
In [180]:
def deviation_from_mean(x):
return x - x.mean()
df['Pace Deviation'] = df.groupby('Club')['Pace'].transform(deviation_from_mean)
35) Rank Players by Wage within Each Club¶
In [183]:
def rank_wage(wages):
return wage.rank(ascending=False)
df['Wage Rank'] = df.groupby('Club')['Wage'].transform(rank_wage)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[183], line 4 1 def rank_wage(wages): 2 return wage.rank(ascending=False) ----> 4 df['Wage Rank'] = df.groupby('Club')['Wage'].transform(rank_wage) File /usr/local/lib/python3.11/site-packages/pandas/core/groupby/generic.py:517, in SeriesGroupBy.transform(self, func, engine, engine_kwargs, *args, **kwargs) 514 @Substitution(klass="Series", example=__examples_series_doc) 515 @Appender(_transform_template) 516 def transform(self, func, *args, engine=None, engine_kwargs=None, **kwargs): --> 517 return self._transform( 518 func, *args, engine=engine, engine_kwargs=engine_kwargs, **kwargs 519 ) File /usr/local/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:2021, in GroupBy._transform(self, func, engine, engine_kwargs, *args, **kwargs) 2018 warn_alias_replacement(self, orig_func, func) 2020 if not isinstance(func, str): -> 2021 return self._transform_general(func, engine, engine_kwargs, *args, **kwargs) 2023 elif func not in base.transform_kernel_allowlist: 2024 msg = f"'{func}' is not a valid function name for transform(name)" File /usr/local/lib/python3.11/site-packages/pandas/core/groupby/generic.py:557, in SeriesGroupBy._transform_general(self, func, engine, engine_kwargs, *args, **kwargs) 552 for name, group in self._grouper.get_iterator( 553 self._obj_with_exclusions, axis=self.axis 554 ): 555 # this setattr is needed for test_transform_lambda_with_datetimetz 556 object.__setattr__(group, "name", name) --> 557 res = func(group, *args, **kwargs) 559 results.append(klass(res, index=group.index)) 561 # check for empty "results" to avoid concat ValueError Cell In[183], line 2, in rank_wage(wages) 1 def rank_wage(wages): ----> 2 return wage.rank(ascending=False) NameError: name 'wage' is not defined
In [ ]: