Saratean Tudor has successfully completed this project.

Intro to Pandas for Data Analysis

easy

4.72

The Birthday Paradox in the NBA

Finished

May 24, 2025 9:24 PM

Elapsed time (min)

Completed activities

Resolution

Activities

Project.ipynb

Notebook

In [1]:

import math
import pandas as pd

In [3]:

def combination(n, k):
    return math.factorial(n)/(math.factorial(k) * math.factorial(n - k))

1. What's the probability when `n` = 10?¶

In [6]:

birthday_probability(10)

Out[6]:

0.11614023654879224

2. What's the probability when `n` is 15?¶

In [7]:

birthday_probability(15)

Out[7]:

0.25028790861398265

3. Implement the `birthday_probability` function¶

In [5]:

def birthday_probability(number_of_people):
    return (1 - (364/365)**combination(number_of_people, 2))

NBA Birthday Paradox Analysis¶

In [8]:

df = pd.read_csv('nba_2017.csv', parse_dates=['Birth Date'])

In [9]:

df.head()

Out[9]:

	Player	Pos	Age	Team	Birth Date
0	Alex Abrines	SG	23.0	Oklahoma City Thunder	1993-08-01
1	Quincy Acy	PF	26.0	Dallas Mavericks	1990-10-06
2	Quincy Acy	PF	26.0	Brooklyn Nets	1990-10-06
3	Steven Adams	C	23.0	Oklahoma City Thunder	1993-07-20
4	Arron Afflalo	SG	31.0	Sacramento Kings	1985-10-15

4. Create the `Birth Date` column¶

In [ ]:

df['Birth Date'].dt.strftime("%Y-%m-%d").head()

In [13]:

df["Birthday"] = df['Birth Date'].dt.strftime("%m-%d")

Interlude: Combinatorics¶

For this project, you're free to use any techinque that you prefer to answer how many players share a birthday for a given team. But, one recommendation would be to use combinatorics; specifically the Combinations, using the itertools.combinations function. Here's a quick example. Suppose we have these samples:

Name	Birthday
John	March 5th
Mary	Sept 20th
Rob	March 5th

Using combinations, we can take all the samples in paris (r=2) to compare them:

Person 1	Person 2
John	Mary
John	Rob
Mary	Rob

Using Python:

In [21]:

from itertools import combinations

In [17]:

names = ["John", "Mary", "Rob"]
birthdays = ["March 5th", "Sept 20th", "March 5th"]

In [18]:

# Note: we need to wrap it in a list to force display
list(combinations(names, 2))

Out[18]:

[('John', 'Mary'), ('John', 'Rob'), ('Mary', 'Rob')]

In [19]:

# Note: we need to wrap it in a list to force display
list(combinations(birthdays, 2))

Out[19]:

[('March 5th', 'Sept 20th'),
 ('March 5th', 'March 5th'),
 ('Sept 20th', 'March 5th')]

We can see how March 5th (John and Rob) are the same dates. Using Pandas:

In [ ]:

names_df = pd.DataFrame(combinations(names, 2), columns=["Person 1", "Person 2"])
names_df

In [ ]:

birthdays_df = pd.DataFrame(combinations(birthdays, 2), columns=["Birthday 1", "Birthday 2"])
birthdays_df

Combining it:

In [ ]:

df_concat = pd.concat([names_df, birthdays_df], axis=1)

In [ ]:

df_concat

In [ ]:

df_concat['Birthday 1'] == df_concat['Birthday 2']

End of the interlude! Now, it's your turn to answer questions.

Activities¶

In [20]:

df.head()

Out[20]:

	Player	Pos	Age	Team	Birth Date	Birthday
0	Alex Abrines	SG	23.0	Oklahoma City Thunder	1993-08-01	08-01
1	Quincy Acy	PF	26.0	Dallas Mavericks	1990-10-06	10-06
2	Quincy Acy	PF	26.0	Brooklyn Nets	1990-10-06	10-06
3	Steven Adams	C	23.0	Oklahoma City Thunder	1993-07-20	07-20
4	Arron Afflalo	SG	31.0	Sacramento Kings	1985-10-15	10-15

In [38]:

new_df = df.loc[
    df['Team'] == 'Dallas Mavericks'
]

In [39]:

dates = pd.DataFrame(combinations(new_df['Birthday'], 2), columns = ['Birthday 1', 'Birthday 2'])

In [40]:

players = pd.DataFrame(combinations(new_df['Player'], 2), columns = ['Person 1', 'Person 2'])

In [41]:

final_df = pd.concat([players, dates], axis=1)

In [37]:

(final_df['Birthday 1'] == final_df['Birthday 2']).sum()

Out[37]:

In [ ]:

7. In the Dallas Mavericks, who shares a birthday with J.J. Barea?¶

In [43]:

final_df.head()

Out[43]:

	Person 1	Person 2	Birthday 1	Birthday 2
0	Quincy Acy	Justin Anderson	10-06	11-19
1	Quincy Acy	J.J. Barea	10-06	06-26
2	Quincy Acy	Harrison Barnes	10-06	05-30
3	Quincy Acy	Ben Bentil	10-06	03-29
4	Quincy Acy	Andrew Bogut	10-06	11-28

In [45]:

final_df.loc[
    (
        (final_df['Person 1'] == 'J.J. Barea') |
        (final_df['Person 2'] == 'J.J. Barea')
    ) &
    (final_df['Birthday 1'] == final_df['Birthday 2'])
].head()

Out[45]:

	Person 1	Person 2	Birthday 1	Birthday 2
65	J.J. Barea	Deron Williams	06-26	06-26

Statement of Completion#8c40480d

Intro to Pandas for Data Analysis

The Birthday Paradox in the NBA

1. What's the probability when `n` = 10?¶

2. What's the probability when `n` is 15?¶

3. Implement the `birthday_probability` function¶

NBA Birthday Paradox Analysis¶

4. Create the `Birth Date` column¶

Interlude: Combinatorics¶

Activities¶

7. In the Dallas Mavericks, who shares a birthday with J.J. Barea?¶

The End!¶

Statement of Completion#8c40480d

Intro to Pandas for Data Analysis

The Birthday Paradox in the NBA

1. What's the probability when n = 10?¶

2. What's the probability when n is 15?¶

3. Implement the birthday_probability function¶

NBA Birthday Paradox Analysis¶

4. Create the Birth Date column¶

Interlude: Combinatorics¶

Activities¶

5. How many pairs of players share a birthday for the Atlanta Hawks?¶

6. How many pairs of players share a birthday in the Cleveland Cavaliers?¶

7. In the Dallas Mavericks, who shares a birthday with J.J. Barea?¶

The End!¶

1. What's the probability when `n` = 10?¶

2. What's the probability when `n` is 15?¶

3. Implement the `birthday_probability` function¶

4. Create the `Birth Date` column¶