Statement of Completion#48b2ea06
Intro to Pandas for Data Analysis
easy
Player Picks: A Dive into Series Filtering
Resolution
Activities
Player Picks: A Dive into Series Filtering¶
In [1]:
# import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Let's Kick-in¶
In [32]:
# Load the dataset
match_stats = pd.read_csv('Match-player-statistics.csv')
In [3]:
match_stats.head()
Out[3]:
MatchID | HomeTeamName | AwayTeamName | PlayerID | PlayerName | PlayerSurname | IsGoalkeeper | PlayedTime | StatsID | StatsName | Value | Rank | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2024442 | Austria | North Macedonia | 98843 | Julian | Baumgartlinger | False | 112 | 1 | Goals | 0 | 5 |
1 | 2024442 | Austria | North Macedonia | 98843 | Julian | Baumgartlinger | False | 112 | 3 | Total Attempts | 0 | 13 |
2 | 2024442 | Austria | North Macedonia | 98843 | Julian | Baumgartlinger | False | 112 | 4 | Attempts on target | 0 | 8 |
3 | 2024442 | Austria | North Macedonia | 98843 | Julian | Baumgartlinger | False | 112 | 5 | Attempts off target | 0 | 6 |
4 | 2024442 | Austria | North Macedonia | 98843 | Julian | Baumgartlinger | False | 112 | 6 | Attempts against woodwork | 0 | 1 |
In [6]:
match_stats.describe()
Out[6]:
MatchID | PlayerID | PlayedTime | StatsID | Rank | |
---|---|---|---|---|---|
count | 2.761810e+05 | 2.761810e+05 | 276181.000000 | 276181.000000 | 276181.000000 |
mean | 2.024467e+06 | 1.995570e+08 | 4288.032877 | 142.584309 | 7.452819 |
std | 1.489066e+01 | 9.997875e+07 | 2099.470203 | 146.984821 | 7.928549 |
min | 2.024441e+06 | 5.214800e+04 | 0.000000 | 1.000000 | 0.000000 |
25% | 2.024453e+06 | 2.500053e+08 | 2432.000000 | 58.000000 | 2.000000 |
50% | 2.024467e+06 | 2.500418e+08 | 5113.000000 | 109.000000 | 4.000000 |
75% | 2.024479e+06 | 2.500761e+08 | 5777.000000 | 167.000000 | 11.000000 |
max | 2.024491e+06 | 2.501551e+08 | 8142.000000 | 683.000000 | 34.000000 |
In [7]:
match_stats.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 276181 entries, 0 to 276180 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 MatchID 276181 non-null int64 1 HomeTeamName 276181 non-null object 2 AwayTeamName 276181 non-null object 3 PlayerID 276181 non-null int64 4 PlayerName 247609 non-null object 5 PlayerSurname 276181 non-null object 6 IsGoalkeeper 276181 non-null bool 7 PlayedTime 276181 non-null int64 8 StatsID 276181 non-null int64 9 StatsName 276181 non-null object 10 Value 275795 non-null object 11 Rank 276181 non-null int64 dtypes: bool(1), int64(5), object(6) memory usage: 23.4+ MB
1. Create a series from the PlayerName
column.¶
In [9]:
player_names = pd.Series(match_stats["PlayerName"])
player_names
Out[9]:
0 Julian 1 Julian 2 Julian 3 Julian 4 Julian ... 276176 Manuel 276177 Manuel 276178 Manuel 276179 Manuel 276180 Manuel Name: PlayerName, Length: 276181, dtype: object
2. Filter out unique player names.¶
In [19]:
unique_player_names = player_names.unique()
unique_player_names
Out[19]:
array(['Julian', 'Andreas', 'Stefan', 'Marko', 'David', 'Aleksandar', 'Martin', 'Michael', 'Marcel', 'Daniel', 'Xaver', 'Konrad', 'Philipp', 'Christoph', 'Sasa', 'Goran', 'Ivan', 'Arijan', 'Stole', 'Visar', 'Ezgjan', 'Darko', 'Boban', 'Tihomir', 'Egzon', 'Enis', 'Milan', 'Eljif', 'Thomas', 'Toby', 'Axel', 'Jan', 'Kevin', 'Romelu', 'Thibaut', 'Thorgan', 'Dries', 'Nacer', 'Dennis', 'Youri', 'Jeremy', 'Giorgio', 'Ciro', 'Marco', 'Leonardo', 'Lorenzo', 'Bryan', 'Andrea', 'Domenico', 'Giovanni', 'Nicolò', nan, 'Gianluigi', 'Federico', 'Rafael', 'Eden', 'Yannick', 'Leander', 'Cristiano', 'Raphaël', 'Diogo', 'Dedryck', 'Timothy', 'Yuri', 'Artem', 'Anton', 'Magomed', 'Roman', 'Denis', 'Vyacheslav', 'Aleksandr', 'Dmitri', 'Andrei', 'Aleksei', 'Georgi', 'Daler', 'Igor', 'Maksim', 'Luka', 'Dejan', 'Domagoj', 'Andrej', 'Šime', 'Mateo', 'Ante', 'Marcelo', 'Dominik', 'Bruno', 'Josip', 'Nikola', 'Joško', 'Lukáš', 'Tomáš', 'Ondrej', 'Vladimír', 'Patrik', 'Petr', 'Alex', 'Jakub', 'Antonín', 'Adam', 'Borna', 'Stephen', 'Grant', 'Callum', 'Stuart', 'Ryan', 'John', 'Andy', 'Scott', 'Kieran', 'Nathan', 'Lyndon', 'Ché', 'Mislav', 'Mario', 'Duje', 'César', 'Pablo', 'Álvaro', 'Aymeric', 'José', 'Dani', 'Mikel', 'Matej', 'Kasper', 'Simon', 'Jens', 'Christian', 'Yussuf', 'Jannik', 'Pierre-Emile', 'Joachim', 'Mathias', 'Joakim', 'Mikkel', 'Kyle', 'Jordan', 'Harry', 'Raheem', 'Jack', 'Luke', 'Declan', 'Marcus', 'Jadon', 'Bukayo', 'Jude', 'Tyrone', 'Kalvin', 'Jason', 'Jonas', 'Tim', 'Joona', 'Jukka', 'Teemu', 'Joni', 'Rasmus', 'Lukas', 'Paulus', 'Jere', 'Joel', 'Robin', 'Glen', 'Leo', 'Mason', 'Phil', 'Dominic', 'Manuel', 'Toni', 'Mats', 'Emre', 'Antonio', 'Serge', 'Leon', 'Timo', 'Matthias', 'Leroy', 'Joshua', 'Kai', 'Jamal', 'Reece', 'Billy', 'Nikolai', 'Fredrik', 'Leandro', 'Michy', 'Hans', 'Pyry', 'Lassi', 'Rifat', 'Matvei', 'Karim', 'Hugo', 'Paul', 'Antoine', 'Adrien', 'Corentin', 'Lucas', 'Presnel', 'Ousmane', 'Kylian', 'Benjamin', "N'Golo", 'Ilkay', 'Moussa', 'Olivier', 'Clément', 'Kingsley', 'Yann', 'Admir', 'Xherdan', 'Steven', 'Haris', 'Granit', 'Ricardo', 'Remo', 'Silvan', 'Fabian', 'Nico', 'Breel', 'Ruben', 'Ádám', 'Péter', 'Loïc', 'Nemanja', 'Endre', 'László', 'Attila', 'Willi', 'Gergo', 'Roland', 'Szabolcs', 'András', 'Tamás', 'Dávid', 'Francesco', 'Matteo', 'Christopher', 'Louis', 'Florian', 'Alessandro', 'Marcos', 'Djibril', 'Salvatore', 'Giacomo', 'Gaetano', 'Chris', 'Gareth', 'Aaron', 'Joe', 'Danny', 'Ben', 'Ethan', 'Connor', 'Dylan', 'Neco', 'Kieffer', 'Georginio', 'Daley', 'Patrick', 'Luuk', 'Marten', 'Memphis', 'Wout', 'Maarten', 'Matthijs', 'Donyell', 'Frenkie', 'Owen', 'Denzel', 'Valentino', 'Quincy', 'Jurriën', 'Pavel', 'Michal', 'Joël', 'Andriy', 'Oleksandr', 'Serhiy', 'Georgiy', 'Ruslan', 'Mykola', 'Vitaliy', 'Illia', 'Ferhan', 'Vlatko', 'Cody', 'Grzegorz', 'Maciej', 'Wojciech', 'Kamil', 'Robert', 'Mateusz', 'Bartosz', 'Piotr', 'Karol', 'Przemyslaw', 'Tymoteusz', 'Peter', 'Marek', 'Róbert', 'Juraj', 'Ján', 'Lubomír', 'Jules', 'Niklas', 'Fedor', 'Liam', 'James', 'Stanislav', 'Adama', 'Pawel', 'Kacper', 'Sebastian', 'Mikael', 'Albin', 'Pierre', 'Emil', 'Viktor', 'Ludwig', 'Kristoffer', 'Victor', 'Alexander', 'Jens-Lys', 'Gustav', 'Filip', 'Taras', 'Yevhen', 'Loris', 'Eduard', 'Kire', 'Tyler', 'Nicolai'], dtype=object)
3. Who Are the Goalkeepers?¶
In [39]:
#goalkeeper_names = pd.Series(match_stats.loc[match_stats["IsGoalkeeper"]].PlayerName)
match_stats["IsGoalkeeper"].value_counts()
Out[39]:
IsGoalkeeper False 276181 Name: count, dtype: int64
4. Filter players with a specific surname.¶
In [47]:
player_surname = pd.Series(match_stats.loc[match_stats["PlayerSurname"] == "Alioski"].PlayerName)
player_surname
Out[47]:
3960 Ezgjan 3961 Ezgjan 3962 Ezgjan 3963 Ezgjan 3964 Ezgjan ... 264684 Ezgjan 264685 Ezgjan 264686 Ezgjan 264687 Ezgjan 264688 Ezgjan Name: PlayerName, Length: 540, dtype: object
5. Filter players who played less than 30 minutes.¶
In [55]:
player_time = pd.Series(match_stats.loc[match_stats["PlayedTime"] < 30].PlayerID)
player_time
Out[55]:
360 103354 361 103354 362 103354 363 103354 364 103354 ... 91344 250135265 91345 250135265 91346 250135265 91347 250135265 91348 250135265 Name: PlayerID, Length: 605, dtype: int64
6. How would you filter a series s
to keep only unique values?¶
In [ ]:
Just for Exploration¶
Ever wondered which teams have been the most consistent participants in matches, both on their home ground and away? This visualization offers a peek into the top 10 teams by total appearances, showcasing the football giants who consistently make it to the field, no matter the location.
In [57]:
team_appearances = pd.concat([match_stats['HomeTeamName'], match_stats['AwayTeamName']]).value_counts().nlargest(10)
plt.figure(figsize=(12, 6))
team_appearances.plot(kind='bar')
plt.title('Top 10 Teams by Appearances (Home + Away)')
plt.xlabel('Team Name')
plt.ylabel('Number of Appearances')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
7. Find players from a specific home team.¶
In [58]:
home_team_wales = pd.Series(match_stats.loc[match_stats["HomeTeamName"] == "Wales"].PlayerName)
home_team_wales
Out[58]:
265949 Gareth 265950 Gareth 265951 Gareth 265952 Gareth 265953 Gareth ... 276176 Manuel 276177 Manuel 276178 Manuel 276179 Manuel 276180 Manuel Name: PlayerName, Length: 10232, dtype: object
8. Filter player id's with missing player names.¶
In [69]:
missing_playername_id = pd.Series(match_stats.loc[match_stats["PlayerName"].isnull()].PlayerID)
missing_playername_id
Out[69]:
10080 250067749 10081 250067749 10082 250067749 10083 250067749 10084 250067749 ... 251184 250045024 251185 250045024 251186 250045024 251187 250045024 251188 250045024 Name: PlayerID, Length: 28572, dtype: int64
Just for Exploration¶
Curious to know who dominates the field in terms of sheer game time? This bar chart highlights the top 10 players based on their total playing time, revealing the endurance and commitment of these athletes. By observing this chart, we can explore how each player’s dedication translates into their performance on the field.
In [71]:
top_players = match_stats.groupby('PlayerName')['PlayedTime'].sum().nlargest(10)
plt.figure(figsize=(12, 6))
top_players.plot(kind='bar')
plt.title('Top 10 Players by Total Played Time')
plt.xlabel('Player Name')
plt.ylabel('Total Played Time')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
9. What's the difference between series.isnull()
and series.notnull()
?¶
In [ ]:
10. Filter players by multiple team names.¶
In [72]:
multiple_team_name = pd.Series(match_stats.loc[match_stats["HomeTeamName"].isin({"North Macedonia", "Germany"})].PlayerName)
multiple_team_name
Out[72]:
102149 Manuel 102150 Manuel 102151 Manuel 102152 Manuel 102153 Manuel ... 170364 Ryan 170365 Ryan 170366 Ryan 170367 Ryan 170368 Ryan Name: PlayerName, Length: 11340, dtype: object
11. What is the PlayerID
of Stefan?¶
In [75]:
player_id = match_stats.loc[match_stats["PlayerName"] == "Stefan"].PlayerID
# Display the player ID
print(f"Player ID for Stefan: {player_id.values[0]}")
Player ID for Stefan: 103354
12. Which of the following best describes the isin()
method in series filtering?¶
In [ ]: