Statement of Completion#8c40480d
Intro to Pandas for Data Analysis
easy
The Birthday Paradox in the NBA
Resolution
Activities
Project.ipynb
In [1]:
import math
import pandas as pd
In [3]:
def combination(n, k):
return math.factorial(n)/(math.factorial(k) * math.factorial(n - k))
1. What's the probability when n
= 10?¶
In [6]:
birthday_probability(10)
Out[6]:
0.11614023654879224
2. What's the probability when n
is 15?¶
In [7]:
birthday_probability(15)
Out[7]:
0.25028790861398265
3. Implement the birthday_probability
function¶
In [5]:
def birthday_probability(number_of_people):
return (1 - (364/365)**combination(number_of_people, 2))
NBA Birthday Paradox Analysis¶
In [8]:
df = pd.read_csv('nba_2017.csv', parse_dates=['Birth Date'])
In [9]:
df.head()
Out[9]:
Player | Pos | Age | Team | Birth Date | |
---|---|---|---|---|---|
0 | Alex Abrines | SG | 23.0 | Oklahoma City Thunder | 1993-08-01 |
1 | Quincy Acy | PF | 26.0 | Dallas Mavericks | 1990-10-06 |
2 | Quincy Acy | PF | 26.0 | Brooklyn Nets | 1990-10-06 |
3 | Steven Adams | C | 23.0 | Oklahoma City Thunder | 1993-07-20 |
4 | Arron Afflalo | SG | 31.0 | Sacramento Kings | 1985-10-15 |
4. Create the Birth Date
column¶
In [ ]:
df['Birth Date'].dt.strftime("%Y-%m-%d").head()
In [13]:
df["Birthday"] = df['Birth Date'].dt.strftime("%m-%d")
Interlude: Combinatorics¶
For this project, you're free to use any techinque that you prefer to answer how many players share a birthday for a given team. But, one recommendation would be to use combinatorics; specifically the Combinations, using the itertools.combinations
function. Here's a quick example. Suppose we have these samples:
Name | Birthday |
---|---|
John | March 5th |
Mary | Sept 20th |
Rob | March 5th |
Using combinations, we can take all the samples in paris (r=2
) to compare them:
Person 1 | Person 2 |
---|---|
John | Mary |
John | Rob |
Mary | Rob |
Using Python:
In [21]:
from itertools import combinations
In [17]:
names = ["John", "Mary", "Rob"]
birthdays = ["March 5th", "Sept 20th", "March 5th"]
In [18]:
# Note: we need to wrap it in a list to force display
list(combinations(names, 2))
Out[18]:
[('John', 'Mary'), ('John', 'Rob'), ('Mary', 'Rob')]
In [19]:
# Note: we need to wrap it in a list to force display
list(combinations(birthdays, 2))
Out[19]:
[('March 5th', 'Sept 20th'), ('March 5th', 'March 5th'), ('Sept 20th', 'March 5th')]
We can see how March 5th
(John and Rob) are the same dates. Using Pandas:
In [ ]:
names_df = pd.DataFrame(combinations(names, 2), columns=["Person 1", "Person 2"])
names_df
In [ ]:
birthdays_df = pd.DataFrame(combinations(birthdays, 2), columns=["Birthday 1", "Birthday 2"])
birthdays_df
Combining it:
In [ ]:
df_concat = pd.concat([names_df, birthdays_df], axis=1)
In [ ]:
df_concat
In [ ]:
df_concat['Birthday 1'] == df_concat['Birthday 2']
End of the interlude! Now, it's your turn to answer questions.
Activities¶
5. How many pairs of players share a birthday for the Atlanta Hawks?¶
In [20]:
df.head()
Out[20]:
Player | Pos | Age | Team | Birth Date | Birthday | |
---|---|---|---|---|---|---|
0 | Alex Abrines | SG | 23.0 | Oklahoma City Thunder | 1993-08-01 | 08-01 |
1 | Quincy Acy | PF | 26.0 | Dallas Mavericks | 1990-10-06 | 10-06 |
2 | Quincy Acy | PF | 26.0 | Brooklyn Nets | 1990-10-06 | 10-06 |
3 | Steven Adams | C | 23.0 | Oklahoma City Thunder | 1993-07-20 | 07-20 |
4 | Arron Afflalo | SG | 31.0 | Sacramento Kings | 1985-10-15 | 10-15 |
In [38]:
new_df = df.loc[
df['Team'] == 'Dallas Mavericks'
]
In [39]:
dates = pd.DataFrame(combinations(new_df['Birthday'], 2), columns = ['Birthday 1', 'Birthday 2'])
In [40]:
players = pd.DataFrame(combinations(new_df['Player'], 2), columns = ['Person 1', 'Person 2'])
In [41]:
final_df = pd.concat([players, dates], axis=1)
In [37]:
(final_df['Birthday 1'] == final_df['Birthday 2']).sum()
Out[37]:
1
6. How many pairs of players share a birthday in the Cleveland Cavaliers?¶
In [ ]:
7. In the Dallas Mavericks, who shares a birthday with J.J. Barea?¶
In [43]:
final_df.head()
Out[43]:
Person 1 | Person 2 | Birthday 1 | Birthday 2 | |
---|---|---|---|---|
0 | Quincy Acy | Justin Anderson | 10-06 | 11-19 |
1 | Quincy Acy | J.J. Barea | 10-06 | 06-26 |
2 | Quincy Acy | Harrison Barnes | 10-06 | 05-30 |
3 | Quincy Acy | Ben Bentil | 10-06 | 03-29 |
4 | Quincy Acy | Andrew Bogut | 10-06 | 11-28 |
In [45]:
final_df.loc[
(
(final_df['Person 1'] == 'J.J. Barea') |
(final_df['Person 2'] == 'J.J. Barea')
) &
(final_df['Birthday 1'] == final_df['Birthday 2'])
].head()
Out[45]:
Person 1 | Person 2 | Birthday 1 | Birthday 2 | |
---|---|---|---|---|
65 | J.J. Barea | Deron Williams | 06-26 | 06-26 |