Statement of Completion#96c0fbb2
Data Wrangling with Pandas
medium
Premier League Match Analysis
Resolution
Activities
Project.ipynb
In [1]:
import pandas as pd
In [67]:
df = pd.read_csv('premier-league-data.csv')
In [52]:
df.head()
Out[52]:
home_team | away_team | home_goals | away_goals | result | season | |
---|---|---|---|---|---|---|
0 | Sheffield United | Liverpool | 1 | 1 | D | 2006-2007 |
1 | Arsenal | Aston Villa | 1 | 1 | D | 2006-2007 |
2 | Everton | Watford | 2 | 1 | H | ? |
3 | Newcastle United | Wigan Athletic | 2 | 1 | H | 2006-2007 |
4 | Portsmouth | Blackburn Rovers | 3 | 0 | H | 2006-2007 |
Data Cleaning¶
Remove invalid values from the season
column¶
In [68]:
df.loc[
df['season'] == '?',
'season'
] = df['season'].str.replace('?', 'Unknown season')
In [69]:
df['season'].value_counts()
Out[69]:
season 2007-2008 380 2009-2010 380 2008-2009 380 2011-2012 380 2010-2011 380 2014-2015 380 2015-2016 380 2012-2013 380 2013-2014 380 2016-2017 380 2017-2018 380 2006-2007 349 Unknown season 31 Name: count, dtype: int64
Identify invalid values in goals scored¶
In [70]:
(df['away_goals'] < 0).sum()
Out[70]:
39
Replace invalid goals for 0¶
In [71]:
df.loc[
(df['away_goals'] < 0),
'away_goals'
] = 0
In [72]:
df['home_goals'] = df['home_goals'].apply(lambda x: 0 if x < 0 else x)
Identify and clean invalid results in the result
column¶
In [26]:
df.head(1)
Out[26]:
home_team | away_team | home_goals | away_goals | result | season | |
---|---|---|---|---|---|---|
0 | Sheffield United | Liverpool | 1 | 1 | D | 2006-2007 |
In [79]:
df['result'].value_counts()
Out[79]:
result H 2107 A 1294 D 1159 Name: count, dtype: int64
In [74]:
df.loc[
df['home_goals'] > df['away_goals'],
'result'
] = 'H'
In [76]:
df.loc[
df['away_goals'] > df['home_goals'],
'result'
] = 'A'
In [78]:
df.loc[
df['away_goals'] == df['home_goals'],
'result'
] = 'D'
Analysis¶
What's the average number of goals per match?¶
In [81]:
df.head(1)
Out[81]:
home_team | away_team | home_goals | away_goals | result | season | |
---|---|---|---|---|---|---|
0 | Sheffield United | Liverpool | 1 | 1 | D | 2006-2007 |
In [82]:
(df['home_goals'] + df['away_goals']).mean()
Out[82]:
2.6633771929824563
Create a new column total_goals
¶
In [86]:
df = df.drop(columns=['total_gols'])
In [87]:
df['total_goals'] = (df['home_goals'] + df['away_goals'])
Calculate average goals per season¶
In [95]:
df.groupby('season')['total_goals'].mean()
Out[95]:
season 2006-2007 2.429799 2007-2008 2.618421 2008-2009 2.463158 2009-2010 2.747368 2010-2011 2.797368 2011-2012 2.763158 2012-2013 2.773684 2013-2014 2.718421 2014-2015 2.500000 2015-2016 2.676316 2016-2017 2.794737 2017-2018 2.678947 Unknown season 2.419355 Name: total_goals, dtype: float64
In [96]:
goals_per_season = df.groupby('season')['total_goals'].mean()
What's the biggest goal difference in a match?¶
In [98]:
(abs(df['home_goals'] - df['away_goals'])).max()
Out[98]:
8
What's the team with most away wins?¶
In [105]:
df.loc[
df['result'] == 'A'
].groupby('away_team')['result'].size().sort_values()
Out[105]:
away_team Charlton Athletic 1 Brighton and Hove Albion 2 Cardiff City 2 Huddersfield Town 3 Sheffield United 3 Blackpool 5 Queens Park Rangers 7 Middlesbrough 8 Wolverhampton Wanderers 9 Reading 10 Birmingham City 10 Norwich City 12 Burnley 13 AFC Bournemouth 13 Hull City 13 Watford 15 Portsmouth 16 Leicester City 22 Fulham 23 Bolton Wanderers 26 Crystal Palace 27 Blackburn Rovers 27 Wigan Athletic 29 Swansea City 31 Southampton 33 West Bromwich Albion 34 Sunderland 35 Stoke City 36 Newcastle United 41 West Ham United 43 Aston Villa 53 Everton 66 Tottenham Hotspur 90 Liverpool 98 Manchester City 98 Arsenal 103 Manchester United 117 Chelsea 120 Name: result, dtype: int64
What's the team with the most goals scored at home?¶
In [107]:
df.groupby('home_team')['home_goals'].sum().sort_values(ascending=False)
Out[107]:
home_team Manchester City 499 Manchester United 495 Chelsea 488 Arsenal 471 Liverpool 459 Tottenham Hotspur 414 Everton 392 West Ham United 283 Newcastle United 267 Stoke City 244 Aston Villa 227 West Bromwich Albion 225 Sunderland 222 Fulham 211 Swansea City 179 Southampton 171 Blackburn Rovers 155 Bolton Wanderers 152 Wigan Athletic 140 Leicester City 119 Crystal Palace 111 Hull City 107 Portsmouth 102 Norwich City 96 Middlesbrough 92 Watford 91 AFC Bournemouth 84 Burnley 80 Reading 71 Birmingham City 67 Wolverhampton Wanderers 62 Queens Park Rangers 60 Blackpool 30 Brighton and Hove Albion 24 Sheffield United 23 Cardiff City 20 Charlton Athletic 19 Huddersfield Town 16 Derby County 12 Name: home_goals, dtype: int64
What's the team that received the least amount of goals while playing at home?¶
In [ ]:
df.groupby('home_team')
In [111]:
(df.groupby('home_team')['away_goals'].sum()/df.groupby('home_team').size()).sort_values()
Out[111]:
home_team Manchester United 0.692982 Liverpool 0.789474 Arsenal 0.802632 Chelsea 0.802632 Manchester City 0.815789 Tottenham Hotspur 0.956140 Birmingham City 1.017544 Everton 1.043860 Charlton Athletic 1.052632 Sheffield United 1.105263 Burnley 1.118421 Stoke City 1.121053 Leicester City 1.131579 Southampton 1.131579 Blackburn Rovers 1.157895 Middlesbrough 1.184211 Portsmouth 1.184211 Aston Villa 1.231579 Sunderland 1.242105 Newcastle United 1.263158 Fulham 1.276316 Norwich City 1.276316 Swansea City 1.278195 Huddersfield Town 1.315789 Crystal Palace 1.315789 Brighton and Hove Albion 1.315789 Bolton Wanderers 1.324561 Reading 1.333333 Queens Park Rangers 1.333333 West Ham United 1.334928 Watford 1.355263 West Bromwich Albion 1.385965 Wigan Athletic 1.413534 Hull City 1.494737 AFC Bournemouth 1.561404 Wolverhampton Wanderers 1.666667 Cardiff City 1.736842 Blackpool 1.947368 Derby County 2.263158 dtype: float64
What's the team with most goals scored playing as a visitor (away from home)?¶
In [112]:
df.groupby('away_team')['away_goals'].sum().sort_values(ascending=False)
Out[112]:
away_team Arsenal 379 Manchester United 366 Manchester City 359 Chelsea 357 Liverpool 348 Tottenham Hotspur 339 Everton 255 Aston Villa 214 West Ham United 209 Newcastle United 177 Sunderland 170 West Bromwich Albion 154 Stoke City 150 Fulham 127 Swansea City 127 Wigan Athletic 125 Southampton 123 Blackburn Rovers 122 Bolton Wanderers 111 Crystal Palace 103 Leicester City 98 Hull City 72 Reading 65 Burnley 64 Portsmouth 63 Norwich City 63 Watford 62 AFC Bournemouth 60 Wolverhampton Wanderers 56 Queens Park Rangers 55 Birmingham City 53 Middlesbrough 49 Blackpool 25 Charlton Athletic 15 Cardiff City 12 Huddersfield Town 12 Brighton and Hove Albion 10 Derby County 8 Sheffield United 8 Name: away_goals, dtype: int64
In [ ]: