Statement of Completion#4467ddd9
Intro to Pandas for Data Analysis
medium
Explore, Filter, Sort and Analyse Cricket Sports Data Using Pandas
Resolution
Activities
Pitch Perfect Analysis: Cricket Stats with Pandas¶
Setting the Stage¶
In [1]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv("preprocessed_data.csv")
In [2]:
# Take a peek at the data
df.head()
Out[2]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2024 | Ruturaj Gaikwad | 2 | 0 | 61 | 46.0 | 30.50 | 51 | 119.61 | 0 | ... | 2 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
1 | 2023 | Ruturaj Gaikwad | 16 | 1 | 590 | 92.0 | 42.14 | 400 | 147.50 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
2 | 2022 | Ruturaj Gaikwad | 14 | 0 | 368 | 99.0 | 26.29 | 291 | 126.46 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
3 | 2021 | Ruturaj Gaikwad | 16 | 2 | 635 | NaN | 45.35 | 466 | 136.26 | 1 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
4 | 2020 | Ruturaj Gaikwad | 6 | 2 | 204 | 72.0 | 51.00 | 169 | 120.71 | 0 | ... | 6 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 rows × 25 columns
In [3]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1130 entries, 0 to 1129 Data columns (total 25 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 1130 non-null int64 1 Player_Name 1130 non-null object 2 Matches_Batted 1130 non-null int64 3 Not_Outs 1130 non-null int64 4 Runs_Scored 1130 non-null int64 5 Highest_Score 700 non-null float64 6 Batting_Average 1130 non-null float64 7 Balls_Faced 1130 non-null int64 8 Batting_Strike_Rate 1130 non-null float64 9 Centuries 1130 non-null int64 10 Half_Centuries 1130 non-null int64 11 Fours 1130 non-null int64 12 Sixes 1130 non-null int64 13 Catches_Taken 1130 non-null int64 14 Stumpings 1130 non-null int64 15 Matches_Bowled 1130 non-null int64 16 Balls_Bowled 1130 non-null int64 17 Runs_Conceded 1130 non-null int64 18 Wickets_Taken 1130 non-null int64 19 Best_Bowling_Match 419 non-null float64 20 Bowling_Average 1130 non-null float64 21 Economy_Rate 1130 non-null float64 22 Bowling_Strike_Rate 1130 non-null float64 23 Four_Wicket_Hauls 1130 non-null int64 24 Five_Wicket_Hauls 1130 non-null int64 dtypes: float64(7), int64(17), object(1) memory usage: 220.8+ KB
In [4]:
df.describe()
Out[4]:
Year | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | Half_Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 1130.000000 | 1130.000000 | 1130.000000 | 1130.000000 | 700.000000 | 1130.000000 | 1130.000000 | 1130.000000 | 1130.000000 | 1130.000000 | ... | 1130.000000 | 1130.000000 | 1130.000000 | 1130.000000 | 419.0 | 1130.000000 | 1130.000000 | 1130.000000 | 1130.000000 | 1130.000000 |
mean | 2019.233628 | 9.016814 | 1.487611 | 132.253097 | 31.438571 | 17.442761 | 98.638053 | 102.314991 | 0.042478 | 0.735398 | ... | 9.049558 | 92.946903 | 125.013274 | 4.459292 | 0.0 | 18.572442 | 5.471097 | 13.050177 | 0.079646 | 0.017699 |
std | 3.992500 | 5.527504 | 1.697267 | 171.061138 | 30.996985 | 16.530296 | 122.684820 | 59.184873 | 0.269437 | 1.383147 | ... | 5.516159 | 118.389409 | 153.815733 | 6.363274 | 0.0 | 23.806204 | 4.568269 | 15.710709 | 0.310478 | 0.131914 |
min | 2008.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 2017.000000 | 3.000000 | 0.000000 | 5.000000 | 2.000000 | 1.000000 | 6.000000 | 66.165000 | 0.000000 | 0.000000 | ... | 3.000000 | 0.000000 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
50% | 2020.000000 | 10.000000 | 1.000000 | 42.000000 | 22.000000 | 14.575000 | 38.000000 | 118.950000 | 0.000000 | 0.000000 | ... | 10.000000 | 27.000000 | 44.500000 | 1.000000 | 0.0 | 15.340000 | 7.250000 | 12.000000 | 0.000000 | 0.000000 |
75% | 2022.750000 | 14.000000 | 2.000000 | 227.000000 | 54.000000 | 28.287500 | 167.000000 | 139.820000 | 0.000000 | 1.000000 | ... | 14.000000 | 168.750000 | 236.000000 | 8.000000 | 0.0 | 29.382500 | 8.800000 | 21.967500 | 0.000000 | 0.000000 |
max | 2024.000000 | 19.000000 | 10.000000 | 973.000000 | 129.000000 | 101.000000 | 640.000000 | 400.000000 | 4.000000 | 9.000000 | ... | 19.000000 | 408.000000 | 564.000000 | 32.000000 | 0.0 | 189.000000 | 36.000000 | 120.000000 | 3.000000 | 1.000000 |
8 rows × 24 columns
Warm Up Activities¶
In [6]:
# Run the cell below and find out which players we'll be expecting to see 🙌🏻
In [9]:
df['Player_Name'].unique().shape
Out[9]:
(206,)
1. Know the players!¶
In [10]:
# Write your code here
df['Runs_Scored'].max()
Out[10]:
973
2. Find the Maximum Runs Scored¶
In [ ]:
# Write your code here
3. How old is the dataset?¶
In [12]:
# Write your code here
df['Year'].unique().shape
Out[12]:
(17,)
4. What is the maximum Batting Strike Rate value?¶
In [14]:
df.columns
Out[14]:
Index(['Year', 'Player_Name', 'Matches_Batted', 'Not_Outs', 'Runs_Scored', 'Highest_Score', 'Batting_Average', 'Balls_Faced', 'Batting_Strike_Rate', 'Centuries', 'Half_Centuries', 'Fours', 'Sixes', 'Catches_Taken', 'Stumpings', 'Matches_Bowled', 'Balls_Bowled', 'Runs_Conceded', 'Wickets_Taken', 'Best_Bowling_Match', 'Bowling_Average', 'Economy_Rate', 'Bowling_Strike_Rate', 'Four_Wicket_Hauls', 'Five_Wicket_Hauls'], dtype='object')
In [15]:
# Write your code here
df['Batting_Strike_Rate'].max()
Out[15]:
400.0
Basic Filtering with loc[]
and Sorting with sort_values()
¶
5. Let's filter out players who played in the 2023!¶
In [16]:
players_2023 = df.loc[df['Year'] == 2023]
players_2023.head()
Out[16]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2023 | Ruturaj Gaikwad | 16 | 1 | 590 | 92.0 | 42.14 | 400 | 147.50 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.00 | 0.00 | 0.0 | 0 | 0 |
5 | 2023 | Devon Conway | 16 | 2 | 672 | NaN | 51.69 | 481 | 139.71 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.00 | 0.00 | 0.0 | 0 | 0 |
8 | 2023 | MS Dhoni | 16 | 8 | 104 | NaN | 26.00 | 57 | 182.46 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.00 | 0.00 | 0.0 | 0 | 0 |
25 | 2023 | Ajinkya Rahane | 14 | 1 | 326 | NaN | 32.60 | 189 | 172.49 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.00 | 0.00 | 0.0 | 0 | 0 |
42 | 2023 | Ravindra Jadeja | 16 | 4 | 190 | NaN | 23.75 | 133 | 142.86 | 0 | ... | 16 | 342 | 431 | 20 | NaN | 21.55 | 7.56 | 17.1 | 0 | 0 |
5 rows × 25 columns
6. Top 5 Players with Max Not Outs¶
In [24]:
top_not_outs = df.sort_values(by='Not_Outs', ascending=False).head(5)
top_not_outs.head()
Out[24]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
925 | 2022 | Dinesh Karthik | 16 | 10 | 330 | NaN | 55.00 | 180 | 183.33 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.0 | 0 | 0 |
17 | 2014 | MS Dhoni | 16 | 10 | 371 | NaN | 74.20 | 250 | 148.40 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.0 | 0 | 0 |
13 | 2018 | MS Dhoni | 16 | 9 | 455 | NaN | 75.83 | 302 | 150.66 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.0 | 0 | 0 |
222 | 2022 | David Miller | 16 | 9 | 481 | NaN | 68.71 | 337 | 142.73 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.0 | 0 | 0 |
616 | 2017 | Hardik Pandya | 17 | 9 | 250 | NaN | 35.71 | 160 | 156.25 | 0 | ... | 17 | 156 | 213 | 6 | NaN | 35.5 | 8.19 | 26.0 | 0 | 0 |
5 rows × 25 columns
Just for exploration!¶
Analyzing High Scorers of the Season¶
Before we dive into the details of players scoring the most runs in 2023, let’s take a broader look at the run distribution over the years.
A histogram will help us understand in which years players scored more freely, potentially reflecting changes in playing conditions or rules.
In [ ]:
# Run the below cell to see the high run scorers of the season!
In [26]:
# Histogram of runs scored by year
plt.figure(figsize=(10, 6))
df.groupby('Year')['Runs_Scored'].sum().plot(kind='bar', color='lightcoral')
plt.title('Total Runs Scored Per Year')
plt.xlabel('Year')
plt.ylabel('Total Runs')
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
7. Top 5 Run Scorers of 2023¶
In [28]:
top_scorers_2023 = df.loc[df['Year'] == 2023].sort_values(by='Runs_Scored', ascending=False).head(5)
top_scorers_2023.head()
Out[28]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
214 | 2023 | Shubman Gill | 17 | 2 | 890 | 129.0 | 59.33 | 564 | 157.80 | 3 | ... | 17 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
888 | 2023 | Faf du Plessis | 14 | 1 | 730 | 84.0 | 56.15 | 475 | 153.68 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 | 2023 | Devon Conway | 16 | 2 | 672 | NaN | 51.69 | 481 | 139.71 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
903 | 2023 | Virat Kohli | 14 | 2 | 639 | NaN | 53.25 | 457 | 139.82 | 2 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
800 | 2023 | Yashasvi Jaiswal | 14 | 1 | 625 | 124.0 | 48.08 | 382 | 163.61 | 1 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 rows × 25 columns
8. Selecting Players with High Batting Average¶
In [30]:
df.columns
Out[30]:
Index(['Year', 'Player_Name', 'Matches_Batted', 'Not_Outs', 'Runs_Scored', 'Highest_Score', 'Batting_Average', 'Balls_Faced', 'Batting_Strike_Rate', 'Centuries', 'Half_Centuries', 'Fours', 'Sixes', 'Catches_Taken', 'Stumpings', 'Matches_Bowled', 'Balls_Bowled', 'Runs_Conceded', 'Wickets_Taken', 'Best_Bowling_Match', 'Bowling_Average', 'Economy_Rate', 'Bowling_Strike_Rate', 'Four_Wicket_Hauls', 'Five_Wicket_Hauls'], dtype='object')
In [33]:
high_average_players = df.loc[df['Batting_Average'] > 50.0]
high_average_players.head()
Out[33]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | 2020 | Ruturaj Gaikwad | 6 | 2 | 204 | 72.0 | 51.00 | 169 | 120.71 | 0 | ... | 6 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 | 2023 | Devon Conway | 16 | 2 | 672 | NaN | 51.69 | 481 | 139.71 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
12 | 2019 | MS Dhoni | 15 | 7 | 416 | NaN | 83.20 | 309 | 134.62 | 0 | ... | 15 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
13 | 2018 | MS Dhoni | 16 | 9 | 455 | NaN | 75.83 | 302 | 150.66 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
17 | 2014 | MS Dhoni | 16 | 10 | 371 | NaN | 74.20 | 250 | 148.40 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 rows × 25 columns
Just for Exploration¶
The year 2023 was a real treat for cricket fans – it saw the highest number of centuries! Was it the pitches, the bats, or just plain skill?
Let’s check out a histogram of centuries made over the years and see how 2023 stood out
In [ ]:
# Run the below cell to find out the maximum number of centuries hit each season!
In [35]:
# Histogram of centuries by year
import matplotlib.pyplot as plt
centuries_per_year = df.groupby('Year')['Centuries'].sum()
plt.figure(figsize=(10, 6))
centuries_per_year.plot(kind='bar', color='orange')
plt.title('Centuries Recorded Per Year')
plt.xlabel('Year')
plt.ylabel('Total Centuries')
plt.xticks(rotation=45)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()
Multi Condition Filtering¶
9. Compound Filtering for Centuries and Wickets¶
In [36]:
century_and_wickets = df.loc[(df['Centuries'] >= 1) & (df['Wickets_Taken'] >= 3)]
century_and_wickets.head()
Out[36]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
973 | 2023 | Cameron Green | 16 | 7 | 452 | NaN | 50.22 | 282 | 160.28 | 1 | ... | 16 | 228 | 361 | 6 | NaN | 60.17 | 9.5 | 38.0 | 0 | 0 |
1 rows × 25 columns
10. Recent High Scores¶
In [38]:
high_scores_2021 = df.loc[(df['Year'] == 2021) & (df['Runs_Scored'] > 50)]
high_scores_2021.head()
Out[38]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 2021 | Ruturaj Gaikwad | 16 | 2 | 635 | NaN | 45.35 | 466 | 136.26 | 1 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.00 | 0.00 | 0.00 | 0 | 0 |
10 | 2021 | MS Dhoni | 16 | 4 | 114 | NaN | 16.28 | 107 | 106.54 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.00 | 0.00 | 0.00 | 0 | 0 |
44 | 2021 | Ravindra Jadeja | 16 | 9 | 227 | NaN | 75.66 | 156 | 145.51 | 0 | ... | 16 | 294 | 346 | 13 | NaN | 26.61 | 7.06 | 22.61 | 0 | 0 |
63 | 2021 | Moeen Ali | 15 | 1 | 357 | 58.0 | 25.50 | 260 | 137.30 | 0 | ... | 15 | 152 | 161 | 6 | NaN | 26.83 | 6.35 | 25.33 | 0 | 0 |
70 | 2021 | Shivam Dube | 9 | 1 | 230 | NaN | 28.75 | 193 | 119.17 | 0 | ... | 9 | 30 | 49 | 0 | NaN | 0.00 | 9.80 | 0.00 | 0 | 0 |
5 rows × 25 columns
11. Filter out the Efficient Strikers from the dataset¶
The distribution of batting strike rates is heavily concentrated around the median of approximately 125. However, there are exceptional instances where the batting strike rates far exceed the average, reaching as high as 350 and beyond. These outliers represent the top performers, indicating a highly aggressive batting style.
Here is the distribution of batting strike rates, with a reference line at the value 250, highlighting these extraordinary performances
In [ ]:
# Run the below cell for distribution
In [40]:
# Boxplot of batting strike rates
plt.figure(figsize=(10, 6))
sns.boxplot(x=df['Batting_Strike_Rate'])
plt.title('Boxplot of Batting Strike Rates')
plt.axvline(x=250, color='red', linestyle='--', label='Reference Line')
plt.xlabel('Batting Strike Rate')
plt.show()
In [41]:
efficient_strikers = df.loc[(df['Batting_Strike_Rate'] > 150) & (df['Runs_Scored'] > 200)]
efficient_strikers.head()
Out[41]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13 | 2018 | MS Dhoni | 16 | 9 | 455 | NaN | 75.83 | 302 | 150.66 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
18 | 2013 | MS Dhoni | 18 | 5 | 461 | NaN | 41.90 | 283 | 162.89 | 0 | ... | 18 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
20 | 2011 | MS Dhoni | 16 | 4 | 392 | NaN | 43.55 | 247 | 158.70 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
25 | 2023 | Ajinkya Rahane | 14 | 1 | 326 | NaN | 32.60 | 189 | 172.49 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
45 | 2020 | Ravindra Jadeja | 14 | 6 | 232 | 50.0 | 46.40 | 135 | 171.85 | 0 | ... | 14 | 218 | 318 | 6 | NaN | 53.0 | 8.75 | 36.33 | 0 | 0 |
5 rows × 25 columns
12. Filtering with Multiple Conditions¶
In [46]:
specific_players = df.loc[(df['Runs_Scored'] > 300) & (df['Wickets_Taken'] == 0)]
In [47]:
specific_players.head()
Out[47]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2023 | Ruturaj Gaikwad | 16 | 1 | 590 | 92.0 | 42.14 | 400 | 147.50 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
2 | 2022 | Ruturaj Gaikwad | 14 | 0 | 368 | 99.0 | 26.29 | 291 | 126.46 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
3 | 2021 | Ruturaj Gaikwad | 16 | 2 | 635 | NaN | 45.35 | 466 | 136.26 | 1 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 | 2023 | Devon Conway | 16 | 2 | 672 | NaN | 51.69 | 481 | 139.71 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
12 | 2019 | MS Dhoni | 15 | 7 | 416 | NaN | 83.20 | 309 | 134.62 | 0 | ... | 15 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 rows × 25 columns
13. Who was the First Player to Achieve a Five Wicket Haul?!¶
In [50]:
df.columns
Out[50]:
Index(['Year', 'Player_Name', 'Matches_Batted', 'Not_Outs', 'Runs_Scored', 'Highest_Score', 'Batting_Average', 'Balls_Faced', 'Batting_Strike_Rate', 'Centuries', 'Half_Centuries', 'Fours', 'Sixes', 'Catches_Taken', 'Stumpings', 'Matches_Bowled', 'Balls_Bowled', 'Runs_Conceded', 'Wickets_Taken', 'Best_Bowling_Match', 'Bowling_Average', 'Economy_Rate', 'Bowling_Strike_Rate', 'Four_Wicket_Hauls', 'Five_Wicket_Hauls'], dtype='object')
In [54]:
first_five_wicket_haul = df.loc[df['Five_Wicket_Hauls'] > 0].nsmallest(1, 'Year')
In [55]:
first_five_wicket_haul.head()
Out[55]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
556 | 2008 | Amit Mishra | 6 | 0 | 37 | 31.0 | 12.33 | 42 | 88.09 | 0 | ... | 6 | 119 | 142 | 11 | NaN | 12.9 | 7.15 | 10.81 | 0 | 1 |
1 rows × 25 columns
In [57]:
df[df['Five_Wicket_Hauls'] > 0].sort_values(by='Year').nsmallest(1, 'Five_Wicket_Hauls')
Out[57]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
556 | 2008 | Amit Mishra | 6 | 0 | 37 | 31.0 | 12.33 | 42 | 88.09 | 0 | ... | 6 | 119 | 142 | 11 | NaN | 12.9 | 7.15 | 10.81 | 0 | 1 |
1 rows × 25 columns
Query Activities¶
14. Query for Five Wicket Hauls¶
In [60]:
five_wicket_hauls = df.query("Five_Wicket_Hauls >= 1")
five_wicket_hauls.head()
Out[60]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
53 | 2012 | Ravindra Jadeja | 19 | 2 | 191 | 48.0 | 15.91 | 151 | 126.49 | 0 | ... | 19 | 210 | 273 | 12 | NaN | 22.75 | 7.80 | 17.50 | 0 | 1 |
199 | 2011 | Ishant Sharma | 12 | 5 | 4 | NaN | 0.00 | 5 | 80.00 | 0 | ... | 12 | 264 | 314 | 11 | NaN | 28.54 | 7.13 | 24.00 | 0 | 1 |
338 | 2023 | Mohit Sharma | 14 | 1 | 0 | NaN | 0.00 | 1 | 0.00 | 0 | ... | 14 | 265 | 361 | 27 | NaN | 13.37 | 8.17 | 9.81 | 2 | 1 |
402 | 2021 | Andre Russell | 10 | 2 | 183 | 54.0 | 26.14 | 120 | 152.50 | 0 | ... | 10 | 114 | 188 | 11 | NaN | 17.09 | 9.89 | 10.36 | 0 | 1 |
437 | 2012 | Sunil Narine | 15 | 1 | 9 | NaN | 9.00 | 7 | 128.57 | 0 | ... | 15 | 355 | 324 | 24 | NaN | 13.50 | 5.47 | 14.79 | 1 | 1 |
5 rows × 25 columns
By the way , comment down below which is your fav IPL team this season!¶
In [ ]:
In [ ]:
15. Filtering with Query¶
In [62]:
active_high_scorers = df.query("Matches_Batted > 15 & Runs_Scored > 500")
active_high_scorers.head()
Out[62]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2023 | Ruturaj Gaikwad | 16 | 1 | 590 | 92.0 | 42.14 | 400 | 147.50 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
3 | 2021 | Ruturaj Gaikwad | 16 | 2 | 635 | NaN | 45.35 | 466 | 136.26 | 1 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 | 2023 | Devon Conway | 16 | 2 | 672 | NaN | 51.69 | 481 | 139.71 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
36 | 2012 | Ajinkya Rahane | 16 | 2 | 560 | NaN | 40.00 | 433 | 129.33 | 1 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
124 | 2020 | David Warner | 16 | 2 | 548 | NaN | 39.14 | 407 | 134.64 | 0 | ... | 16 | 1 | 2 | 0 | NaN | 0.0 | 12.0 | 0.0 | 0 | 0 |
5 rows × 25 columns
Multi Condition Sorting¶
16. Who is the Titan ?¶
In [64]:
sorted_by_runs_and_score = df.sort_values(by=['Runs_Scored', 'Highest_Score'], ascending = False)
sorted_by_runs_and_score.head()
Out[64]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
910 | 2016 | Virat Kohli | 16 | 4 | 973 | 113.0 | 81.08 | 640 | 152.03 | 4 | ... | 16 | 6 | 13 | 0 | NaN | 0.0 | 13.0 | 0.0 | 0 | 0 |
214 | 2023 | Shubman Gill | 17 | 2 | 890 | 129.0 | 59.33 | 564 | 157.80 | 3 | ... | 17 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
786 | 2022 | Jos Buttler | 17 | 2 | 863 | 116.0 | 57.53 | 579 | 149.05 | 4 | ... | 17 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
127 | 2016 | David Warner | 17 | 3 | 848 | NaN | 60.57 | 560 | 151.42 | 0 | ... | 17 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
257 | 2018 | Kane Williamson | 17 | 3 | 735 | 84.0 | 52.50 | 516 | 142.44 | 0 | ... | 17 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 rows × 25 columns
17. How well do you understand the impact of economy rate changes over the years?¶
In [ ]:
Understanding Economy Rate¶
The economy rate is a key bowling metric that calculates the average runs conceded per over bowled—the lower the economy rate, the better the bowling!
From the below boxplot, it's clear that most bowlers consistently maintained a moderate economy rate over the years, with recent seasons like 2023 and 2024 showing a few instances of unusually high rates, hinting at some challenging games for the bowlers.
In [ ]:
# Run the below cell for understanding the 'Economy rate' over the seasons
In [66]:
df_clean = df.dropna(subset=['Economy_Rate'])
# Create a boxplot of economy rates across years
plt.figure(figsize=(14, 8))
sns.boxplot(x='Year', y='Economy_Rate', data=df_clean, palette='coolwarm')
plt.title('Boxplot of Economy Rates by Year')
plt.xlabel('Year')
plt.ylabel('Economy Rate')
plt.xticks(rotation=45)
plt.grid(True, linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
In [ ]:
18. Sorting by Economy Rate for 2022¶
In [69]:
sorted_economy_2022 = df.loc[df['Year'] == 2022].sort_values(by='Economy_Rate', ascending=True)
sorted_economy_2022.head()
Out[69]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 2022 | Ruturaj Gaikwad | 14 | 0 | 368 | 99.0 | 26.29 | 291 | 126.46 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
465 | 2022 | Devdutt Padikkal | 17 | 0 | 376 | 54.0 | 22.12 | 306 | 122.88 | 0 | ... | 17 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
233 | 2022 | Matthew Wade | 10 | 0 | 157 | 35.0 | 15.70 | 138 | 113.77 | 0 | ... | 10 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
237 | 2022 | Wriddhiman Saha | 11 | 1 | 317 | 68.0 | 31.70 | 259 | 122.39 | 0 | ... | 11 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
253 | 2022 | Kane Williamson | 13 | 2 | 216 | 57.0 | 19.64 | 231 | 93.51 | 0 | ... | 13 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 rows × 25 columns
Filtering using : isin()
, nsmallest()
, and nlargest()
¶
19. Filtering Players by Specific Match Conditions¶
In [71]:
years = [2021, 2022]
players_recent_years = df.loc[df['Year'].isin(years)]
players_recent_years.head()
Out[71]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 2022 | Ruturaj Gaikwad | 14 | 0 | 368 | 99.0 | 26.29 | 291 | 126.46 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
3 | 2021 | Ruturaj Gaikwad | 16 | 2 | 635 | NaN | 45.35 | 466 | 136.26 | 1 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
6 | 2022 | Devon Conway | 7 | 1 | 252 | 87.0 | 42.00 | 173 | 145.66 | 0 | ... | 7 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
9 | 2022 | MS Dhoni | 14 | 6 | 232 | NaN | 33.14 | 188 | 123.40 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
10 | 2021 | MS Dhoni | 16 | 4 | 114 | NaN | 16.28 | 107 | 106.54 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 rows × 25 columns
In [ ]:
Just for exploration¶
Let's plot the averages before we focus on players with extraordinary averages.
This line graph will help us see trends and anomalies in batting performances.
In [ ]:
# Run the below cell for understanding the 'Batting Averages' over the years
In [73]:
# Line plot of batting averages over the years
average_by_year = df.groupby('Year')['Batting_Average'].mean()
plt.figure(figsize=(10, 6))
average_by_year.plot(kind='line', marker='o', linestyle='-', color='navy')
plt.title('Average Batting Averages Over the Years')
plt.xlabel('Year')
plt.ylabel('Batting Average')
plt.grid(True)
plt.show()
20. Analyze Rohit Sharma's Matches Across Different Years¶
In [76]:
rohit_matches = df.loc[df['Player_Name'] == 'Rohit Sharma', ['Year', 'Matches_Batted']].sort_values(by='Year', ascending=True)
rohit_matches
Out[76]:
Year | Matches_Batted | |
---|---|---|
577 | 2008 | 13 |
576 | 2009 | 16 |
575 | 2010 | 16 |
574 | 2011 | 16 |
573 | 2012 | 17 |
572 | 2013 | 19 |
571 | 2014 | 15 |
570 | 2015 | 16 |
569 | 2016 | 14 |
568 | 2017 | 17 |
567 | 2018 | 14 |
566 | 2019 | 15 |
565 | 2020 | 12 |
564 | 2021 | 13 |
563 | 2022 | 14 |
562 | 2023 | 16 |
561 | 2024 | 1 |
Just for Exploration¶
Let’s use a violin plot to visualize the distribution of wickets taken across all matches, emphasizing performances in games where bowlers took more than five wickets.
The widest part of the plot indicates that most wickets taken fall around the 5 to 10 range, suggesting this is the most common performance for bowlers in significant matches.
The plot's long, thin tails extending towards the higher wicket counts, such as 20 and above, indicate these are rarer, exceptional performances.
In [ ]:
# Run the below cell
In [78]:
# Filter for high-performance games
high_wicket_games = df[df['Wickets_Taken'] >= 5]
# Violin plot for distribution of wickets in high-performance games
plt.figure(figsize=(12, 6))
sns.violinplot(x='Wickets_Taken', data=high_wicket_games)
plt.title('Distribution of Wickets Taken in High-Performance Games')
plt.xlabel('Wickets Taken')
plt.grid(True)
plt.show()
21. Who is the Purple Cap Winner of 2022?!¶
In [79]:
df.columns
Out[79]:
Index(['Year', 'Player_Name', 'Matches_Batted', 'Not_Outs', 'Runs_Scored', 'Highest_Score', 'Batting_Average', 'Balls_Faced', 'Batting_Strike_Rate', 'Centuries', 'Half_Centuries', 'Fours', 'Sixes', 'Catches_Taken', 'Stumpings', 'Matches_Bowled', 'Balls_Bowled', 'Runs_Conceded', 'Wickets_Taken', 'Best_Bowling_Match', 'Bowling_Average', 'Economy_Rate', 'Bowling_Strike_Rate', 'Four_Wicket_Hauls', 'Five_Wicket_Hauls'], dtype='object')
In [81]:
max_wickets_2022 = df.loc[df['Year'] == 2022].nlargest(1,'Wickets_Taken')
max_wickets_2022
Out[81]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
873 | 2022 | Yuzvendra Chahal | 17 | 1 | 5 | 5.0 | 5.0 | 8 | 62.5 | 0 | ... | 17 | 408 | 527 | 27 | NaN | 19.52 | 7.75 | 15.11 | 1 | 1 |
1 rows × 25 columns
22. Who is the player with the Lowest Economy Rate?¶
In [84]:
lowest_economy_rate = df.loc[df['Wickets_Taken'] >= 1].nsmallest(1, 'Economy_Rate')
lowest_economy_rate
Out[84]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
828 | 2009 | Ravichandran Ashwin | 2 | 0 | 8 | 8.0 | 8.0 | 15 | 53.33 | 0 | ... | 2 | 24 | 13 | 2 | NaN | 6.5 | 3.25 | 12.0 | 0 | 0 |
1 rows × 25 columns
23. Best Bowling Performance of that Player¶
In [87]:
best_bowling_performance = df.loc[df['Player_Name'] == 'Ravichandran Ashwin'].nlargest(1,'Best_Bowling_Match')
best_bowling_performance
Out[87]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
814 | 2024 | Ravichandran Ashwin | 1 | 0 | 0 | 0.0 | 0.0 | 0 | 0.0 | 0 | ... | 1 | 24 | 35 | 1 | NaN | 35.0 | 8.75 | 24.0 | 0 | 0 |
1 rows × 25 columns
24. Identify Consistent Performers¶
In [99]:
year_run_df = df[(df['Year'].isin([2020, 2021, 2022, 2023])) & (df['Runs_Scored'] > 300)]
year_run_df
Out[99]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2023 | Ruturaj Gaikwad | 16 | 1 | 590 | 92.0 | 42.14 | 400 | 147.50 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
2 | 2022 | Ruturaj Gaikwad | 14 | 0 | 368 | 99.0 | 26.29 | 291 | 126.46 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
3 | 2021 | Ruturaj Gaikwad | 16 | 2 | 635 | NaN | 45.35 | 466 | 136.26 | 1 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 | 2023 | Devon Conway | 16 | 2 | 672 | NaN | 51.69 | 481 | 139.71 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
25 | 2023 | Ajinkya Rahane | 14 | 1 | 326 | NaN | 32.60 | 189 | 172.49 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1018 | 2021 | Rahul Tripathi | 17 | 2 | 397 | NaN | 28.35 | 283 | 140.28 | 0 | ... | 17 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
1028 | 2021 | Mayank Agarwal | 12 | 1 | 441 | NaN | 40.09 | 314 | 140.44 | 0 | ... | 12 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
1029 | 2020 | Mayank Agarwal | 11 | 0 | 424 | 106.0 | 38.54 | 271 | 156.45 | 1 | ... | 11 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
1043 | 2023 | Heinrich Klaasen | 12 | 2 | 448 | 104.0 | 49.78 | 253 | 177.08 | 1 | ... | 12 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
1051 | 2022 | Abhishek Sharma | 14 | 0 | 426 | 75.0 | 30.43 | 320 | 133.13 | 0 | ... | 14 | 24 | 38 | 0 | NaN | 0.0 | 9.5 | 0.0 | 0 | 0 |
102 rows × 25 columns
In [97]:
# Filter the dataframe for players who scored more than 300 runs in the years 2020, 2021, 2022, and 2023
year_run_df = [2020, 2021, 2022, 2023]
consistent_performers = df.loc[(df['Year'].isin(year_run_df)) & (df['Runs_Scored'] > 300)]
consistent_performers.head()
Out[97]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2023 | Ruturaj Gaikwad | 16 | 1 | 590 | 92.0 | 42.14 | 400 | 147.50 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
2 | 2022 | Ruturaj Gaikwad | 14 | 0 | 368 | 99.0 | 26.29 | 291 | 126.46 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
3 | 2021 | Ruturaj Gaikwad | 16 | 2 | 635 | NaN | 45.35 | 466 | 136.26 | 1 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 | 2023 | Devon Conway | 16 | 2 | 672 | NaN | 51.69 | 481 | 139.71 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
25 | 2023 | Ajinkya Rahane | 14 | 1 | 326 | NaN | 32.60 | 189 | 172.49 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 |
5 rows × 25 columns
In [104]:
year_run_df['Player_Name'].value_counts()[year_run_df['Player_Name'].value_counts() == 4]
Out[104]:
Player_Name Sanju Samson 4 Shikhar Dhawan 4 Suryakumar Yadav 4 Nitish Rana 4 Virat Kohli 4 Shubman Gill 4 Faf du Plessis 4 Name: count, dtype: int64
In [101]:
# Check if a player is present in all four years using apply method and filter the dataframe accordingly
# Run the following code, run see if your output matches the expected output
consistent_performers = year_run_df[year_run_df['Player_Name'].isin(year_run_df['Player_Name'].value_counts()[year_run_df['Player_Name'].value_counts() == 4].index)]
consistent_performers
Out[101]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Matches_Bowled | Balls_Bowled | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
214 | 2023 | Shubman Gill | 17 | 2 | 890 | 129.0 | 59.33 | 564 | 157.80 | 3 | ... | 17 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
215 | 2022 | Shubman Gill | 16 | 2 | 483 | 96.0 | 34.50 | 365 | 132.33 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
216 | 2021 | Shubman Gill | 17 | 0 | 478 | 57.0 | 28.11 | 402 | 118.90 | 0 | ... | 17 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
217 | 2020 | Shubman Gill | 14 | 1 | 440 | NaN | 33.84 | 373 | 117.96 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
364 | 2023 | Nitish Rana | 14 | 1 | 413 | 75.0 | 31.77 | 293 | 140.96 | 0 | ... | 14 | 53 | 75 | 3 | NaN | 25.0 | 8.49 | 17.67 | 0 | 0 |
365 | 2022 | Nitish Rana | 14 | 1 | 361 | 57.0 | 27.77 | 251 | 143.82 | 0 | ... | 14 | 12 | 23 | 0 | NaN | 0.0 | 11.50 | 0.00 | 0 | 0 |
366 | 2021 | Nitish Rana | 17 | 3 | 383 | 80.0 | 29.46 | 314 | 121.97 | 0 | ... | 17 | 12 | 12 | 0 | NaN | 0.0 | 6.00 | 0.00 | 0 | 0 |
367 | 2020 | Nitish Rana | 14 | 0 | 352 | 87.0 | 25.14 | 254 | 138.58 | 0 | ... | 14 | 12 | 23 | 0 | NaN | 0.0 | 11.50 | 0.00 | 0 | 0 |
580 | 2023 | Suryakumar Yadav | 16 | 2 | 605 | NaN | 43.21 | 334 | 181.14 | 1 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
581 | 2022 | Suryakumar Yadav | 8 | 1 | 303 | NaN | 43.29 | 208 | 145.67 | 0 | ... | 8 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
582 | 2021 | Suryakumar Yadav | 14 | 0 | 317 | 82.0 | 22.64 | 221 | 143.43 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
583 | 2020 | Suryakumar Yadav | 16 | 3 | 480 | NaN | 40.00 | 331 | 145.01 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
671 | 2023 | Shikhar Dhawan | 11 | 2 | 373 | NaN | 41.44 | 261 | 142.91 | 0 | ... | 11 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
672 | 2022 | Shikhar Dhawan | 14 | 2 | 460 | NaN | 38.33 | 375 | 122.67 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
673 | 2021 | Shikhar Dhawan | 16 | 1 | 587 | 92.0 | 39.13 | 471 | 124.62 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
674 | 2020 | Shikhar Dhawan | 17 | 3 | 618 | NaN | 44.14 | 427 | 144.73 | 2 | ... | 17 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
773 | 2023 | Sanju Samson | 14 | 2 | 362 | NaN | 30.17 | 236 | 153.39 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
774 | 2022 | Sanju Samson | 17 | 1 | 458 | 55.0 | 28.63 | 312 | 146.79 | 0 | ... | 17 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
775 | 2021 | Sanju Samson | 14 | 2 | 484 | 119.0 | 40.33 | 354 | 136.72 | 1 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
776 | 2020 | Sanju Samson | 14 | 1 | 375 | 85.0 | 28.84 | 236 | 158.89 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
888 | 2023 | Faf du Plessis | 14 | 1 | 730 | 84.0 | 56.15 | 475 | 153.68 | 0 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
889 | 2022 | Faf du Plessis | 16 | 1 | 468 | 96.0 | 31.20 | 367 | 127.52 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
890 | 2021 | Faf du Plessis | 16 | 2 | 633 | NaN | 45.21 | 458 | 138.20 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
891 | 2020 | Faf du Plessis | 13 | 2 | 449 | NaN | 40.81 | 319 | 140.75 | 0 | ... | 13 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
903 | 2023 | Virat Kohli | 14 | 2 | 639 | NaN | 53.25 | 457 | 139.82 | 2 | ... | 14 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
904 | 2022 | Virat Kohli | 16 | 1 | 341 | 73.0 | 22.73 | 294 | 115.99 | 0 | ... | 16 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
905 | 2021 | Virat Kohli | 15 | 1 | 405 | NaN | 28.92 | 339 | 119.46 | 0 | ... | 15 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
906 | 2020 | Virat Kohli | 15 | 4 | 466 | NaN | 42.36 | 384 | 121.35 | 0 | ... | 15 | 0 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.00 | 0 | 0 |
28 rows × 25 columns
25. Rank Players Based on Their Scores¶
In [108]:
df['Runs_Rank'] = df['Runs_Scored'].rank(method='max', ascending=False)
In [109]:
print(df[['Player_Name', 'Year', 'Runs_Scored', 'Runs_Rank']])
Player_Name Year Runs_Scored Runs_Rank 0 Ruturaj Gaikwad 2024 61 515.0 1 Ruturaj Gaikwad 2023 590 23.0 2 Ruturaj Gaikwad 2022 368 143.0 3 Ruturaj Gaikwad 2021 635 14.0 4 Ruturaj Gaikwad 2020 204 307.0 ... ... ... ... ... 1125 Mayank Markande 2023 21 674.0 1126 Mayank Markande 2022 0 1130.0 1127 Mayank Markande 2021 0 1130.0 1128 Mayank Markande 2019 6 838.0 1129 Mayank Markande 2018 21 674.0 [1130 rows x 4 columns]
Just for Exploration !¶
Every year we see a great number of new talents!
Let's plot a pie chart to understand the percentage of number of new debutants every year!
In [ ]:
# Run the below cell for understanding the % of debutants every season!
In [114]:
df.sort_values('Year').drop_duplicates('Player_Name')['Year']
Out[114]:
23 2008 577 2008 918 2008 556 2008 686 2008 ... 277 2024 310 2024 886 2024 142 2024 173 2024 Name: Year, Length: 206, dtype: int64
In [111]:
df['Debut_Year'] = df.sort_values('Year').drop_duplicates('Player_Name')['Year']
debutants_count = df[df['Debut_Year'].notna()]['Debut_Year'].value_counts().sort_index()
plt.figure(figsize=(10, 8))
plt.pie(debutants_count, labels=debutants_count.index, autopct='%1.1f%%', startangle=140, colors=plt.cm.Paired(range(len(debutants_count))))
plt.title('Percentage of Debutants by Year')
plt.axis('equal')
plt.show()
26. Career Starters¶
In [116]:
debutants_2022 = df.loc[df['Debut_Year'] == 2022]
debutants_2022
Out[116]:
Year | Player_Name | Matches_Batted | Not_Outs | Runs_Scored | Highest_Score | Batting_Average | Balls_Faced | Batting_Strike_Rate | Centuries | ... | Runs_Conceded | Wickets_Taken | Best_Bowling_Match | Bowling_Average | Economy_Rate | Bowling_Strike_Rate | Four_Wicket_Hauls | Five_Wicket_Hauls | Runs_Rank | Debut_Year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6 | 2022 | Devon Conway | 7 | 1 | 252 | 87.0 | 42.00 | 173 | 145.66 | 0 | ... | 0 | 0 | 0.0 | 0.00 | 0.00 | 0.00 | 0 | 0 | 251.0 | 2022.0 |
83 | 2022 | Daryl Mitchell | 2 | 0 | 33 | 17.0 | 16.50 | 44 | 75.00 | 0 | ... | 27 | 0 | NaN | 0.00 | 13.50 | 0.00 | 0 | 0 | 614.0 | 2022.0 |
95 | 2022 | Maheesh Theekshana | 9 | 1 | 7 | NaN | 7.00 | 7 | 100.00 | 0 | ... | 261 | 12 | NaN | 21.75 | 7.46 | 17.50 | 1 | 0 | 823.0 | 2022.0 |
96 | 2022 | Mukesh Choudhary | 13 | 1 | 6 | 4.0 | 6.00 | 6 | 100.00 | 0 | ... | 424 | 16 | NaN | 26.50 | 9.32 | 17.06 | 1 | 0 | 838.0 | 2022.0 |
104 | 2022 | Prashant Solanki | 2 | 0 | 0 | 0.0 | 0.00 | 0 | 0.00 | 0 | ... | 38 | 2 | NaN | 19.00 | 6.33 | 18.00 | 0 | 0 | 1130.0 | 2022.0 |
105 | 2022 | Simarjeet Singh | 6 | 2 | 7 | NaN | 7.00 | 8 | 87.50 | 0 | ... | 138 | 4 | NaN | 34.50 | 7.67 | 27.00 | 0 | 0 | 823.0 | 2022.0 |
111 | 2022 | Matheesha Pathirana | 2 | 0 | 0 | 0.0 | 0.00 | 0 | 0.00 | 0 | ... | 52 | 2 | NaN | 26.00 | 7.61 | 20.50 | 0 | 0 | 1130.0 | 2022.0 |
150 | 2022 | Tristan Stubbs | 2 | 0 | 2 | 2.0 | 1.00 | 4 | 50.00 | 0 | ... | 0 | 0 | 0.0 | 0.00 | 0.00 | 0.00 | 0 | 0 | 911.0 | 2022.0 |
262 | 2022 | Abhinav Manohar | 8 | 1 | 108 | 43.0 | 18.00 | 75 | 144.00 | 0 | ... | 0 | 0 | 0.0 | 0.00 | 0.00 | 0.00 | 0 | 0 | 426.0 | 2022.0 |
265 | 2022 | B. Sai Sudharsan | 5 | 1 | 145 | NaN | 36.25 | 114 | 127.19 | 0 | ... | 0 | 0 | 0.0 | 0.00 | 0.00 | 0.00 | 0 | 0 | 373.0 | 2022.0 |
267 | 2022 | Darshan Nalkande | 2 | 0 | 0 | 0.0 | 0.00 | 0 | 0.00 | 0 | ... | 59 | 2 | NaN | 29.50 | 11.42 | 15.50 | 0 | 0 | 1130.0 | 2022.0 |
313 | 2022 | Sai Kishore | 5 | 0 | 0 | 0.0 | 0.00 | 0 | 0.00 | 0 | ... | 121 | 6 | NaN | 20.17 | 7.56 | 16.00 | 0 | 0 | 1130.0 | 2022.0 |
398 | 2022 | Ramandeep Singh | 5 | 2 | 45 | NaN | 22.50 | 40 | 112.50 | 0 | ... | 54 | 6 | NaN | 9.00 | 9.00 | 6.00 | 0 | 0 | 556.0 | 2022.0 |
421 | 2022 | Dushmantha Chameera | 12 | 4 | 43 | 17.0 | 10.75 | 28 | 153.57 | 0 | ... | 384 | 9 | NaN | 42.67 | 8.73 | 29.33 | 0 | 0 | 564.0 | 2022.0 |
424 | 2022 | Harshit Rana | 2 | 1 | 2 | 2.0 | 2.00 | 2 | 100.00 | 0 | ... | 51 | 1 | NaN | 51.00 | 10.20 | 30.00 | 0 | 0 | 911.0 | 2022.0 |
439 | 2022 | Vaibhav Arora | 5 | 2 | 5 | 2.0 | 2.50 | 13 | 38.46 | 0 | ... | 164 | 3 | NaN | 54.67 | 9.20 | 35.67 | 0 | 0 | 851.0 | 2022.0 |
488 | 2022 | Ayush Badoni | 13 | 3 | 161 | 54.0 | 20.13 | 130 | 123.85 | 0 | ... | 11 | 2 | NaN | 5.50 | 5.50 | 6.00 | 0 | 0 | 350.0 | 2022.0 |
525 | 2022 | Prerak Mankad | 1 | 1 | 4 | NaN | 0.00 | 1 | 400.00 | 0 | ... | 0 | 0 | 0.0 | 0.00 | 0.00 | 0.00 | 0 | 0 | 866.0 | 2022.0 |
536 | 2022 | Mohsin Khan | 9 | 2 | 23 | NaN | 23.00 | 16 | 143.75 | 0 | ... | 197 | 14 | NaN | 14.07 | 5.97 | 14.14 | 1 | 0 | 656.0 | 2022.0 |
579 | 2022 | Dewald Brevis | 7 | 0 | 161 | 49.0 | 23.00 | 113 | 142.48 | 0 | ... | 8 | 1 | NaN | 8.00 | 16.00 | 3.00 | 0 | 0 | 350.0 | 2022.0 |
602 | 2022 | N. Tilak Varma | 14 | 3 | 397 | 61.0 | 36.09 | 303 | 131.02 | 0 | ... | 13 | 0 | NaN | 0.00 | 6.50 | 0.00 | 0 | 0 | 118.0 | 2022.0 |
626 | 2022 | Romario Shepherd | 3 | 1 | 58 | NaN | 29.00 | 41 | 141.46 | 0 | ... | 98 | 3 | NaN | 32.67 | 10.89 | 18.00 | 0 | 0 | 526.0 | 2022.0 |
643 | 2022 | Kumar Kartikeya Singh | 4 | 0 | 3 | 3.0 | 3.00 | 5 | 60.00 | 0 | ... | 102 | 5 | NaN | 20.40 | 7.85 | 15.60 | 0 | 0 | 882.0 | 2022.0 |
689 | 2022 | Jitesh Sharma | 12 | 2 | 234 | 44.0 | 29.25 | 143 | 163.64 | 0 | ... | 0 | 0 | 0.0 | 0.00 | 0.00 | 0.00 | 0 | 0 | 274.0 | 2022.0 |
714 | 2022 | Shashank Singh | 10 | 1 | 69 | NaN | 17.25 | 47 | 146.81 | 0 | ... | 20 | 0 | NaN | 0.00 | 10.00 | 0.00 | 0 | 0 | 490.0 | 2022.0 |
813 | 2022 | Rovman Powell | 14 | 2 | 250 | NaN | 25.00 | 167 | 149.70 | 0 | ... | 17 | 0 | NaN | 0.00 | 17.00 | 0.00 | 0 | 0 | 254.0 | 2022.0 |
838 | 2022 | Kuldeep Sen | 7 | 1 | 0 | NaN | 0.00 | 3 | 0.00 | 0 | ... | 237 | 8 | NaN | 29.63 | 9.42 | 18.88 | 1 | 0 | 1130.0 | 2022.0 |
941 | 2022 | Suyash S Prabhudessai | 5 | 0 | 67 | 34.0 | 13.40 | 59 | 113.56 | 0 | ... | 0 | 0 | 0.0 | 0.00 | 0.00 | 0.00 | 0 | 0 | 498.0 | 2022.0 |
978 | 2022 | Akash Deep | 5 | 0 | 0 | 0.0 | 0.00 | 2 | 0.00 | 0 | ... | 205 | 5 | NaN | 41.00 | 10.88 | 22.60 | 0 | 0 | 1130.0 | 2022.0 |
999 | 2022 | Yash Dayal | 9 | 0 | 0 | 0.0 | 0.00 | 1 | 0.00 | 0 | ... | 296 | 11 | NaN | 26.91 | 9.25 | 17.45 | 0 | 0 | 1130.0 | 2022.0 |
1100 | 2022 | Fazalhaq Farooqi | 3 | 1 | 2 | NaN | 0.00 | 8 | 25.00 | 0 | ... | 110 | 2 | NaN | 55.00 | 9.17 | 36.00 | 0 | 0 | 911.0 | 2022.0 |
31 rows × 27 columns