Unscripted Insights: Data Wrangling with F.R.I.E.N.D.S¶

Starting Off in the Real World of Data with F.R.I.E.N.D.S!¶

In [1]:

# Importing necessary libraries for data manipulation and visualization
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Setting visual preferences for plotting
plt.style.use('ggplot')

# Loading the datasets
friends_df = pd.read_csv('friends.csv')
friends_info_df = pd.read_csv('friends_info.csv')

In [2]:

friends_df.head()

Out[2]:

	text	speaker	season	episode	scene	utterance
0	There's nothing to tell! He's just some guy I ...	Monica Geller	1	1	1	1
1	C'mon, you're going out with the guy! There's ...	Joey Tribbiani	1	1	1	2
2	All right Joey, be nice. So does he have a hum...	Chandler Bing	1	1	1	3
3	Wait, does he eat chalk?	Phoebe Buffay	1	1	1	4
4	(They all stare, bemused.)	Scene Directions	1	1	1	5

In [3]:

friends_info_df.head()

Out[3]:

	season	episode	title	directed_by	written_by	air_date	us_views_millions	imdb_rating
0	1	1	The Pilot	James Burrows	David Crane & Marta Kauffman	1994-09-22	21.5	8.3
1	1	2	The One with the Sonogram at the End	James Burrows	David Crane & Marta Kauffman	1994-09-29	20.2	8.1
2	1	3	The One with the Thumb	James Burrows	Jeffrey Astrof & Mike Sikowitz	1994-10-06	19.5	8.2
3	1	4	The One with George Stephanopoulos	James Burrows	Alexa Junge	1994-10-13	19.7	8.1
4	1	5	The One with the East German Laundry Detergent	Pamela Fryman	Jeff Greenstein & Jeff Strauss	1994-10-20	18.6	8.5

In [4]:

friends_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 67373 entries, 0 to 67372
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   text       67373 non-null  object
 1   speaker    67097 non-null  object
 2   season     67373 non-null  int64 
 3   episode    67373 non-null  int64 
 4   scene      67373 non-null  int64 
 5   utterance  67373 non-null  int64 
dtypes: int64(4), object(2)
memory usage: 3.1+ MB

In [5]:

friends_info_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 236 entries, 0 to 235
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   season             236 non-null    int64  
 1   episode            236 non-null    int64  
 2   title              236 non-null    object 
 3   directed_by        236 non-null    object 
 4   written_by         236 non-null    object 
 5   air_date           236 non-null    object 
 6   us_views_millions  236 non-null    float64
 7   imdb_rating        236 non-null    float64
dtypes: float64(2), int64(2), object(4)
memory usage: 14.9+ KB

Let's Start!¶

Using Built-in Aggregation Functions¶

1. Who Talks the Most?¶

In [8]:

# Use the 'speaker' and 'text' columns to count the number of dialogues per character
friends_df.groupby('speaker')['text'].count().idxmax()

Out[8]:

'Rachel Green'

2. Seasonal Dialogue Sum¶

In [9]:

friends_df['text'].str.split()

Out[9]:

0        [There's, nothing, to, tell!, He's, just, some...
1        [C'mon,, you're, going, out, with, the, guy!, ...
2        [All, right, Joey,, be, nice., So, does, he, h...
3                           [Wait,, does, he, eat, chalk?]
4                          [(They, all, stare,, bemused.)]
                               ...                        
67368                        [Oh,, it's, gonna, be, okay.]
67369    [Do, you, guys, have, to, go, to, the, new, ho...
67370                               [We, got, some, time.]
67371              [Okay,, should, we, get, some, coffee?]
67372                                      [Sure., Where?]
Name: text, Length: 67373, dtype: object

In [11]:

# First, calculate the word count for each dialogue
friends_df['word_count'] = friends_df['text'].str.split().apply(len)
friends_df['word_count']

Out[11]:

0        11
1        14
2        16
3         5
4         4
         ..
67368     5
67369    18
67370     4
67371     6
67372     2
Name: word_count, Length: 67373, dtype: int64

In [12]:

# Now sum these word counts by season and store the result in the variable `seasonal_word_sum`
seasonal_word_sum = friends_df.groupby('season')['word_count'].sum()
seasonal_word_sum

Out[12]:

season
1     65205
2     64129
3     75710
4     74817
5     74405
6     77765
7     73662
8     71555
9     75523
10    55237
Name: word_count, dtype: int64

3. Average Episode Length¶

In [14]:

# Calculate the mean number of unique scenes per episode for each season from the 'scene' column
friends_df.groupby('season')['scene'].nunique().mean()

Out[14]:

19.5

4. Shortest and Longest Dialogues¶

In [15]:

# First, compute the length of each dialogue
friends_df['dialogue_length'] = friends_df['text'].str.len()
friends_df['dialogue_length']

Out[15]:

0        56
1        80
2        72
3        24
4        26
         ..
67368    23
67369    77
67370    17
67371    32
67372    12
Name: dialogue_length, Length: 67373, dtype: int64

In [16]:

# Now find the minimum and maximum dialogue length for each character using agg() 
dialogue_lengths = friends_df.groupby('speaker')['dialogue_length'].agg(['min', 'max'])
dialogue_lengths

Out[16]:

	min	max
speaker
#ALL#	2	73
1st Customer	25	25
A Casino Boss	52	52
A Crew Member	14	16
A Disembodied Voice	15	15
...	...	...
Woman On Tv	19	80
Woman's Voice	74	74
Writer	71	71
Zack	5	165
Zoe	30	30

699 rows × 2 columns

5. Comprehensive Character Stats¶

In [18]:

# use all the aggregation function 'mean', 'std', 'min', 'max', 'median'
char_stats = friends_df.groupby('speaker')['dialogue_length'].agg(['mean', 'std', 'min', 'max', 'median'])
char_stats 

Out[18]:

	mean	std	min	max	median
speaker
#ALL#	12.028818	11.069892	2	73	8.0
1st Customer	25.000000	NaN	25	25	25.0
A Casino Boss	52.000000	NaN	52	52	52.0
A Crew Member	15.333333	1.154701	14	16	16.0
A Disembodied Voice	15.000000	NaN	15	15	15.0
...	...	...	...	...	...
Woman On Tv	41.000000	33.867388	19	80	24.0
Woman's Voice	74.000000	NaN	74	74	74.0
Writer	71.000000	NaN	71	71	71.0
Zack	41.300000	39.095161	5	165	38.5
Zoe	30.000000	NaN	30	30	30.0

699 rows × 5 columns

6. Understanding .groupby() Function¶

In [ ]:

Just for Exploration¶

Which Episode had the hearts of the audience ?¶

F.R.I.E.N.D.S has always been more than just a TV show. Each episode is like catching up with old friends!

As we delve into the data, lets find out which episode truly stood out as the fan favorite? Let's analyze views and IMDb ratings to uncover which storylines and moments captivated viewers the most.

In [20]:

# Calculate a score by combining views and IMDb ratings
friends_info_df ['score'] = friends_info_df ['us_views_millions'] + friends_info_df ['imdb_rating']

# Get the top 10 episodes by score
top_episodes = friends_info_df .sort_values(by='score', ascending=False).head(13)

# Set up the matplotlib figure
plt.figure(figsize=(8, 4))
sns.barplot(x='score', y='title', data=top_episodes, palette='viridis')

# Add titles and labels
plt.title('Top 10 Most Loved Episodes of Friends', fontsize=16)
plt.xlabel('Score (US Views Millions × IMDb Rating)', fontsize=14)
plt.ylabel('Episode Title', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)

# Display the plot
plt.show()

Our analysis leads us to "The Last One," the series finale. This finale beautifully encapsulated the essence of friendship and change, leaving a lasting impression and securing its place as a perennial favorite among the show’s fans.

Defining and Using Custom Aggregation Functions with .agg()¶

7. Custom Aggregation: Unique Words¶

In [21]:

# Define a function to count unique words per character's dialogues
def unique_word_count(texts):
    return len(set(' '.join(texts).split()))

In [22]:

# Apply this function to group by dialogue for each speaker and aggregate unique words
unique_words_per_character = friends_df.groupby('speaker')['text'].agg(unique_word_count)
unique_words_per_character

Out[22]:

speaker
#ALL#                  394
1st Customer             3
A Casino Boss            9
A Crew Member            9
A Disembodied Voice      2
                      ... 
Woman On Tv             22
Woman's Voice           13
Writer                  12
Zack                   119
Zoe                      4
Name: text, Length: 699, dtype: int64

8. Phoebe’s Family History¶

In [24]:

def most_frequent_mentions(dialogues):
    words = pd.Series(' '.join(dialogues).split())
    return words[words.str.lower().isin(['mom', 'dad', 'sister', 'brother', 'family'])].value_counts().idxmax()

In [25]:

# Apply this function to group dialogue by season and aggregate most frequent mentions
family_mentions = friends_df[friends_df['speaker'] == 'Phoebe Buffay'].groupby('season')['text'].agg(most_frequent_mentions)
family_mentions

Out[25]:

season
1        mom
2        dad
3        Mom
4        mom
5        mom
6     sister
7        mom
8        dad
9     family
10    family
Name: text, dtype: object

9. Using Aggregation Functions¶

In [ ]:

Grouping by Multiple Columns¶

10. The One Where Joey Speaks¶

In [27]:

# Use the 'speaker', 'text' and 'season' columns to count the number of dialogues by joey
joey_lines = friends_df[friends_df['speaker'] == 'Joey Tribbiani'].groupby('season')['text'].count()
joey_lines

Out[27]:

season
1     640
2     654
3     774
4     838
5     935
6     909
7     933
8     909
9     856
10    723
Name: text, dtype: int64

11. Chandler's Job Mystery¶

In [30]:

# Step 1: Filter rows containing "job" or "work"
job_work_mentions = friends_df[friends_df['text'].str.lower().str.contains('job|work', regex=True)]

In [32]:

# Step 2: Filter rows where speaker is Chandler Bing
chandler_mentions = job_work_mentions[job_work_mentions['speaker'] == 'Chandler Bing']

In [33]:

# Step 3: Group by season
grouped_by_season = chandler_mentions.groupby('season')

In [34]:

# Step 4: Count the number of rows in each group
chandler_job_explanations = grouped_by_season.size()
chandler_job_explanations

Out[34]:

season
1     17
2     17
3     12
4     11
5     14
6     19
7     12
8     17
9     31
10    13
dtype: int64

Just for Exploration¶

The Dynamics of Ross and Rachel’s Relationship¶

Ross and Rachel's relationship is a central storyline in the show. Their conversations show how their relationship changes over time, from beginning to end. By looking at their dialogues across all the seasons, you can see the highs and lows of their connection. Let’s map out their dialogues to understand how their story develops and changes throughout the series.

In [51]:

# Run the below cell to try to understand this relationship!

In [35]:

# Filter dialogues between Ross and Rachel
ross_rachel_dialogues = friends_df[(friends_df['speaker'] == 'Ross Geller') & (friends_df['text'].str.contains('Rachel')) |
                                   (friends_df['speaker'] == 'Rachel Green') & (friends_df['text'].str.contains('Ross'))]

# Group by season and count dialogues
dialogues_per_season = ross_rachel_dialogues.groupby('season').size()

# Plotting
plt.figure(figsize=(10, 6))
dialogues_per_season.plot(kind='line', marker='o', linestyle='-', color='blue')
plt.title('Ross and Rachel Dialogues Per Season')
plt.xlabel('Season')
plt.ylabel('Number of Dialogues')
plt.grid(True)
plt.show()

12. Flashback Flashes¶

In [37]:

# Identify episodes with frequent flashbacks by searching for phrases like 'remember when' and 'back when'. 
# Then group and count these instances by season and episode, highlighting how the series revisits its past.
flashbacks = friends_df[friends_df['text'].str.contains('remember when|back when', case = False)]

In [38]:

flashback_mentions = flashbacks.groupby(['season', 'episode']).size()
flashback_mentions

Out[38]:

season  episode
1       1          1
        2          1
        3          1
        6          1
        13         1
        19         1
2       4          1
        6          1
        12         1
        14         1
        16         1
        24         1
3       3          1
        10         1
        13         1
        15         1
        17         1
4       8          1
        23         1
5       2          1
        5          1
        6          1
        8          1
        10         2
6       6          1
        7          2
        10         1
        17         1
        18         1
7       2          1
        3          1
        6          1
        15         1
8       5          1
        18         1
9       6          1
        24         1
dtype: int64

13. The One with the Catchphrases¶

In [40]:

catchphrase = friends_df[friends_df['text'].str.contains("how you doin'?", case = False)]

In [41]:

# Use groupby() over `speaker` column
joey_catchphrases = catchphrase.groupby('speaker').size()
joey_catchphrases

Out[41]:

speaker
Dana Keystone        1
Dr. Franzblau        1
Frank Buffay Jr.     1
Joey Tribbiani      25
Monica Geller        2
Rachel Green         4
Ross Geller          4
Susan Bunch          1
Susie Moss           1
Tag Jones            1
dtype: int64

14. Ross's Weddings¶

In [43]:

# Filter data where text mentions 'wedding' and 'speaker' is Ross Geller
ross_weddings = friends_df[(friends_df['text'].str.contains("wedding", case = False)) & (friends_df['speaker'] == 'Ross Geller')]

In [44]:

# Group ross_weddings by season and episode, and count entries
episode_counts = ross_weddings.groupby(['season', 'episode']).size()

In [45]:

# Find the episode with the most dialogues about weddings
max_wedding_episode = episode_counts.idxmax()

In [46]:

# Print the result
print(f"Ross's wedding episode with the most dialogue is in Season {max_wedding_episode[0]}, Episode {max_wedding_episode[1]} with {episode_counts[max_wedding_episode]} mentions.")

Ross's wedding episode with the most dialogue is in Season 4, Episode 23 with 6 mentions.

Applying Functions to Groups with `.apply()` and `transform()`¶

15. Filtering with filter()¶

In [ ]:

16. Monica's Cleaning Episodes¶

In [ ]:

friends_df[friends_df['speaker'] == 'Monica Geller']

In [51]:

friends_df[friends_df['text'].str.contains('clean|dust|soap', case=False)]

Out[51]:

	text	speaker	season	episode	scene	utterance	word_count	dialogue_length
30	No, no don't! Stop cleansing my aura! No, just...	Ross Geller	1	1	1	31	14	73
141	I know, I know, I'm such an idiot. I guess I s...	Paul the Wine Guy	1	1	5	2	35	165
262	You're welcome. I remember when I first came t...	Phoebe Buffay	1	1	14	6	72	390
451	All right, you guys, I kinda gotta clean up now.	Rachel Green	1	2	5	10	10	48
455	(Joey turns off the lights, and they all leave...	Scene Directions	1	2	5	14	20	108
...	...	...	...	...	...	...	...	...
66235	I mean, this soap opera is a great gig, but......	Joey Tribbiani	10	14	13	1	36	192
66588	I forgot to pick up my dry cleaning!	Joey Tribbiani	10	16	4	27	8	36
66720	Well, I was cleaning out the closet and I foun...	Chandler Bing	10	16	12	18	16	84
66883	We'll just get him cleaned up a bit.	Nurse	10	17	5	22	8	36
67110	Damn, that window is clean.	Phoebe Buffay	10	17	12	16	5	27

192 rows × 8 columns

In [53]:

# Filter rows where Monica Geller is the speaker and then group by season and sum the occurrences of "clean," "dust," or "soap" in 'text'
cleaning_mentions = friends_df[(friends_df['speaker'] == 'Monica Geller') & (friends_df['text'].str.contains('clean|dust|soap', case=True))].groupby('season')['text'].count()
cleaning_mentions

Out[53]:

season
1     1
2     2
4     3
5     3
6     2
7     1
8     8
9     4
10    4
Name: text, dtype: int64

In [54]:

cleaning_mentions = friends_df[friends_df['speaker'] == 'Monica Geller'].groupby('season')['text'].apply(lambda x: x.str.count('clean|dust|soap').sum())
cleaning_mentions

Out[54]:

season
1     1
2     2
3     0
4     3
5     4
6     2
7     1
8     9
9     6
10    4
Name: text, dtype: int64

Just for Exploration¶

Coffee Time Trends at Central Perk¶

Central Perk is more than just a coffee shop in "F.R.I.E.N.D.S"; it's where countless iconic moments unfolded. Ever wondered how the gang's visits to Central Perk changed over the seasons? Let’s visualize the number of scenes set in Central Perk per season to see how the group's coffee habits evolved.

In [ ]:

# Run the below cell to understand Coffee Time Trends at Central Perk

In [56]:

# Filter for scenes at Central Perk
central_perk_scenes = friends_df[friends_df['text'].str.contains('Central Perk', na=False)]

# Group by season and count the number of scenes
scenes_per_season = central_perk_scenes.groupby('season').size()

# Plotting
plt.figure(figsize=(10, 6))
scenes_per_season.plot(kind='bar', color='brown', alpha=0.7)
plt.title('Number of Central Perk Scenes Per Season')
plt.xlabel('Season')
plt.ylabel('Number of Scenes')
plt.xticks(rotation=0)
plt.show()

17. The One with the Longest Monologue¶

In [ ]:

In [64]:

# Identify the season with the longest dialogue using maximum string length in 'text'
max_dialogue_length = friends_df.groupby('season')['text'].apply(lambda x: x.str.len().idxmax())
max_dialogue_length

Out[64]:

season
1      3700
2      6627
3     14558
4     26127
5     28882
6     35444
7     47565
8     53426
9     57472
10    63950
Name: text, dtype: int64

In [67]:

friends_df.loc[63950, 'text']

Out[67]:

"[Scene: Monica and Chandler's. They are preparing to show Laura around. Laura is standing with her back to the window, Chandler and Monica are standing on either side of her, facing each other. Laura: Well, I must say, this seems like a lovely environment to raise a child in. Monica: Oh, by the way, you are more than welcome to look under any of the furniture, because, believe me, you won't find any porn or cigarettes under there! Laura: Oh! Well, actually, before we look around, let me make sure I have everything I need up to here... (She starts checking her form. Chandler sees movement near the window from the corner of his eye and when he looks he spots Joey climbing up the fire escape and onto their balcony. He warns Monica silently.) Monica: (Pulls Laura into the spare room) Why don't I show you the baby's room? (Joey enters through the side window and jogs towards the kitchen holding a baseball bat) Chandler: What the hell are you doing? Joey: Well, you wouldn't let me in, so I thought you were in trouble. Chandler: Well, we're not. Joey: But you called me 'Bert'!? That's our code word for danger! Chandler: We don't have a code word. Joey: We don't? We really should. From now on, 'Bert' will be our code word for danger. (Monica talks loudly in the baby's room) Monica: So that was the baby's room. (They come out and Chandler throws Joey behind the couch and puts his foot on him. Monica looks at Chandler) Monica: (To Chandler) What room should we see next? Chandler: Any room that isn't behind this couch! (laughs nervously) Monica: (laughs nervously as well, Laura looks confused) (To Laura) Some people don't get him, but I think he's really funny! (She takes Laura to their own bedroom). (Joey gets up and look annoyed) Joey: (quivering with anger) I did not care for that! Chandler: (escorting Joey to the door) You have to get out of here. You slept with our social worker and you never called her back and she is still pissed, so she can't see you. Joey: Ok, ok! (He leaves) Chandler: Ok! (Joey leaves and closes the door behind him. Chandler walks towards the living room, but then Joey enters again.) Chandler: What? Joey: I forgot my bat. (He picks up his bat and holds it up, but then Monica and Laura enter the living room again. When Laura sees Joey, she freezes...) Laura: Oh my God! Chandler: And for the last time, we do not want to be friends with you! And we don't want to buy your bat! (Joey lowers his bat) Laura: What are you doing here? Joey: (to Chandler) Bert! Bert! Bert! Bert! Laura: Are you friends with him? Chandler: I can explain... Joey... Joey: Uhm... ok... uhm... Well, yeah... You have got some nerve, coming back here. I can't believe you never called me. Laura: Excuse me? Joey: Oh... yeah... Probably you don't even remember my name. It's Joey, by the way. And don't bother telling me yours, because I totally remember it... lady. Yeah! I waited weeks for you to call me. Laura: I gave you my number, you never called me. Joey: No, no! Don't try to turn this around on me, ok? I'm not some kind of... social work, ok, that you can just... do. Laura: (embarrassed towards Chandler and Monica) Well, I'm pretty sure I gave you my number. Joey: Really? Think about it. Come on! You're a beautiful woman, smart, funny, we had a really good time, huh? If I had your number, why wouldn't I call you? Laura: I don't know... Well, maybe I'm wrong... I'm sorry... Joey: No, no, hey, no! Too late for apologies... ok? You broke my heart. You know how many women I had to sleep with to get over you? (and he leaves the apartment, leaving her shocked) Laura: Joey, wait! Joey: (acting sad) NO! I waited a long time, I can't wait anymore... (and closes the door behind him) Laura: (laughing nervously) I'm sorry that you had to see that. I'm so embarrassed... Chandler: Oh, that's really ok. Monica: Yeah, that we totally understand. Dating is hard. Laura: Boy, you people are nice... And I've got to say... I think you're going to make excellent parents. (Chandler and Monica hug each other, and then Joey enters the apartment again.) Joey: LAURA! (and points to her, very confident)[Scene: The New York City Children's Fund building. Phoebe and Mike are entering.]"

18. The One with the Routine¶

In [68]:

friends_info_df.head() 

Out[68]:

	season	episode	title	directed_by	written_by	air_date	us_views_millions	imdb_rating	score
0	1	1	The Pilot	James Burrows	David Crane & Marta Kauffman	1994-09-22	21.5	8.3	29.8
1	1	2	The One with the Sonogram at the End	James Burrows	David Crane & Marta Kauffman	1994-09-29	20.2	8.1	28.3
2	1	3	The One with the Thumb	James Burrows	Jeffrey Astrof & Mike Sikowitz	1994-10-06	19.5	8.2	27.7
3	1	4	The One with George Stephanopoulos	James Burrows	Alexa Junge	1994-10-13	19.7	8.1	27.8
4	1	5	The One with the East German Laundry Detergent	Pamela Fryman	Jeff Greenstein & Jeff Strauss	1994-10-20	18.6	8.5	27.1

In [73]:

# Find seasons with the most mentions of 'Routine' or 'Dance' in the title and count dialogues
# Use the friends_info_df!
dance_music_dialogues = friends_info_df.groupby('season')['title'].apply(lambda x: x.str.contains('Routine|Dance', case=False).count())
dance_music_dialogues
dance_music_dialogues= friends_info_df[friends_info_df['title'].apply(lambda x: 'Routine' in x or 'Dance' in x)].groupby('season').size()
dance_music_dialogues

Out[73]:

season
6    1
dtype: int64

19. Chandler in a Box¶

In [80]:

friends_df.head()

Out[80]:

	text	speaker	season	episode	scene	utterance	word_count	dialogue_length
0	There's nothing to tell! He's just some guy I ...	Monica Geller	1	1	1	1	11	56
1	C'mon, you're going out with the guy! There's ...	Joey Tribbiani	1	1	1	2	14	80
2	All right Joey, be nice. So does he have a hum...	Chandler Bing	1	1	1	3	16	72
3	Wait, does he eat chalk?	Phoebe Buffay	1	1	1	4	5	24
4	(They all stare, bemused.)	Scene Directions	1	1	1	5	4	26

In [100]:

len(['Hey', '', "it's", 'me.', 'I', 'know', 'you', "can't", 'stand'])

Out[100]:

In [77]:

# First lets filter out the episode using the `title` column of the `friends_info_df` dataset.
# Episode : 'The One with Chandler in a Box'
chandler_box = friends_info_df[friends_info_df['title'] == 'The One with Chandler in a Box']
chandler_box

Out[77]:

	season	episode	title	directed_by	written_by	air_date	us_views_millions	imdb_rating	score
80	4	8	The One with Chandler in a Box	Peter Bonerz	Michael Borkow	1997-11-20	26.8	9.1	35.9

In [107]:

# Calculate the average number of words spoken by each character in 'The One with Chandler in a Box' episode by grouping the dialogue by speaker and applying a function to count words per dialogue.
# Hint : Use the season and episode number obtained from above in the `friends_df' dataset.
avg_words_per_character = friends_df[(friends_df['season'] == 4) & (friends_df['episode'] == 8)].groupby('speaker')['text'].apply(lambda x: x.str.split().apply(len).mean())
avg_words_per_character

Out[107]:

speaker
Chandler Bing       13.022727
Doctor               8.000000
Gunther             13.000000
Joey Tribbiani      10.304348
Kathy               15.000000
Monica Geller        8.208333
Nurse               11.000000
Phoebe Buffay       12.208333
Rachel Green         9.142857
Ross Geller         10.323529
Scene Directions     9.611111
Timothy Burke        6.250000
Voice               14.500000
Name: text, dtype: float64

20. Transformations with .transform()¶

In [ ]:

21. Dialogue Transformation¶

In [113]:

friends_df.groupby('speaker')['dialogue_length'].transform(lambda x: (x - x.mean()) / x.std())

Out[113]:

0        0.115586
1        0.465502
2        0.374232
3       -0.547378
4       -0.550232
           ...   
67368   -0.568622
67369    0.410054
67370   -0.665936
67371   -0.366175
67372   -0.780283
Name: dialogue_length, Length: 67373, dtype: float64

In [112]:

friends_df.groupby('speaker')['dialogue_length']

Out[112]:

<pandas.core.groupby.generic.SeriesGroupBy object at 0x73484242f090>

In [115]:

# Now apply the `transform` method using the `dialogue_length`
friends_df['normalized_length'] = friends_df.groupby('speaker')['dialogue_length'].transform(lambda x: (x - x.mean()) / x.std())

Just for Exploration¶

Dynamics of the Main Six Characters¶

The six main characters of "F.R.I.E.N.D.S" form the heart of the show. Observing how their dialogue contributions vary season by season can provide fans with insights into character development and storyline emphasis. Let’s use a stacked bar chart to visualize each character's dialogue counts per season, illustrating their prominence and interaction within the group.

In [ ]:

# Run the below cell to understand Dynamics of the Main Six Characters

In [117]:

# Filter out the main six characters
main_characters = ['Rachel Green', 'Ross Geller', 'Monica Geller', 'Chandler Bing', 'Joey Tribbiani', 'Phoebe Buffay']
filtered_df = friends_df[friends_df['speaker'].isin(main_characters)]

# Group by season and speaker, then count dialogues
dialogues_per_season = filtered_df.groupby(['season', 'speaker']).size().unstack()

# Define a color palette for the characters
colors = ['#6A0DAD', '#7B2CBF', '#9D4EDD', '#C77DFF', '#D6BCFA', '#E6E6FA']  # Soft pastel colors

# Plotting using a stacked bar chart
dialogues_per_season.plot(kind='bar', stacked=True, figsize=(12, 8), color=colors)
plt.title('Dialogue Contributions of Main Characters Per Season')
plt.xlabel('Season')
plt.ylabel('Number of Dialogues')
plt.legend(title='Character', labels=main_characters)
plt.xticks(rotation=0)
plt.show()

Pivoting Grouped Data with `.pivot_table()`¶

22. Pivoting Data with pivot_table()¶

In [ ]:

23. Central Perk Coffee Talks¶

In [119]:

friends_df[friends_df['text'].str.contains('Central Perk', na=False)]

Out[119]:

	text	speaker	season	episode	scene	utterance	word_count	dialogue_length	normalized_length
1445	(A flashback of Aurora and Chandler on their d...	Scene Directions	1	6	3	7	16	89	0.073443
1631	Everybody? Shh, shhh. Uhhh... Central Perk is ...	Rachel Green	1	7	1	1	16	95	0.720546
2874	Central Perk is proud to present Miss Phoebe B...	Rachel Green	1	11	9	19	9	52	-0.021184
3100	[Cut back to Central Perk.]	Scene Directions	1	12	8	3	5	27	-0.540332
3105	[Cut back to Central Perk.]	Scene Directions	1	12	8	8	5	27	-0.540332
...	...	...	...	...	...	...	...	...	...
66185	[Scene: Central Perk. Phoebe and Mike are leav...	Scene Directions	10	14	11	0	8	51	-0.302742
66279	[Scene: Central Perk. Phoebe's reading a newsp...	Scene Directions	10	15	2	0	12	81	-0.005754
66324	[Scene: Central Perk. Phoebe's reading, Joey h...	Scene Directions	10	15	4	0	11	71	-0.104750
66928	[Scene: Central Perk. Ross, Phoebe and Joey ar...	Scene Directions	10	17	7	0	9	55	-0.263144
67074	[Scene: The street right in front of Central P...	Scene Directions	10	17	10	0	21	112	0.301134

499 rows × 9 columns

In [134]:

# Filter for dialogues occurring in Central Perk 
central_perk_talks = friends_df[friends_df['text'].str.contains('Central Perk', case = False)]

In [135]:

# Create a pivot table to count the number of dialogues each character has in Central Perk, grouped by season.
# Use 'season' as the index, 'speaker' as the columns, and 'text' for counting dialogues, with missing values as 0.
central_perk_pivot = central_perk_talks.pivot_table(values='text', index= 'season', columns='speaker', aggfunc='count', fill_value=0)
central_perk_pivot

Out[135]:

speaker	Chandler Bing	Phoebe Buffay	Rachel Green	Ross Geller	Scene Directions
season
1	0	0	2	0	3
2	1	2	1	0	50
3	0	0	0	0	64
4	1	0	0	0	55
5	0	0	0	1	53
6	0	0	0	0	75
7	0	0	0	0	62
8	0	0	0	0	46
9	0	0	0	0	43
10	0	0	0	0	40

In [ ]:

In [122]:

#friends_info_df.pivot_table(values='us_views_millions', index=['season', 'episode'], columns='directed_by', aggfunc='sum')

24. The One with All the Thanksgivings¶

In [137]:

# Filter dialogues mentioning "Thanksgiving" 
thanksgiving_dialogues = friends_df[friends_df['text'].str.contains('Thanksgiving', case = False)]

In [140]:

# Create a pivot table to count the number of dialogues each main character has in Thanksgiving episodes, grouped by season.
# Use 'season' as the index, 'speaker' as the columns, and 'text' for counting dialogues, with missing values as 0.
thanksgiving_pivot = thanksgiving_dialogues.pivot_table(values = 'text', index = 'season', columns = 'speaker', aggfunc='count', fill_value = 0)

thanksgiving_pivot

Out[140]:

speaker	Amy Green	Chandler Bing	Janine Lecroix	Joey Tribbiani	Judy Geller	Monica Geller	Mr. Ratstatter	Phoebe Buffay	Rachel Green	Ross Geller	Scene Directions	Tag Jones	The Girls	Timothy Burke	Will Colbert
season
1	0	4	0	1	0	4	0	3	2	1	0	0	0	0	0
2	0	0	0	0	0	1	3	1	1	0	0	0	0	0	0
3	0	1	0	0	0	1	0	1	0	0	2	0	0	0	0
4	0	0	0	0	0	0	0	1	0	1	2	0	0	1	0
5	0	3	0	1	1	3	0	0	4	6	11	0	0	0	0
6	0	1	2	1	0	1	0	2	1	5	2	0	1	0	0
7	0	0	0	0	0	2	0	1	1	2	2	1	0	0	0
8	0	0	0	2	0	2	0	1	0	0	0	0	0	0	1
9	2	0	0	2	0	0	0	1	3	0	0	0	0	0	0
10	0	3	0	3	0	2	0	3	5	2	0	0	0	0	0

The One Where We Wrap Up¶

In [ ]:

Statement of Completion#f725683c

Data Wrangling with Pandas

Unscripted Insights: Data Wrangling with F.R.I.E.N.D.S

Unscripted Insights: Data Wrangling with F.R.I.E.N.D.S¶

Starting Off in the Real World of Data with F.R.I.E.N.D.S!¶

Let's Start!¶

Using Built-in Aggregation Functions¶

1. Who Talks the Most?¶

2. Seasonal Dialogue Sum¶

3. Average Episode Length¶

4. Shortest and Longest Dialogues¶

5. Comprehensive Character Stats¶

6. Understanding .groupby() Function¶

Just for Exploration¶

Which Episode had the hearts of the audience ?¶

Defining and Using Custom Aggregation Functions with .agg()¶

7. Custom Aggregation: Unique Words¶

8. Phoebe’s Family History¶

9. Using Aggregation Functions¶

Grouping by Multiple Columns¶

10. The One Where Joey Speaks¶

11. Chandler's Job Mystery¶

Just for Exploration¶

The Dynamics of Ross and Rachel’s Relationship¶

12. Flashback Flashes¶

13. The One with the Catchphrases¶

14. Ross's Weddings¶

Applying Functions to Groups with .apply() and transform()¶

15. Filtering with filter()¶

16. Monica's Cleaning Episodes¶

Just for Exploration¶

Coffee Time Trends at Central Perk¶

17. The One with the Longest Monologue¶

18. The One with the Routine¶

19. Chandler in a Box¶

20. Transformations with .transform()¶

21. Dialogue Transformation¶

Just for Exploration¶

Dynamics of the Main Six Characters¶

Pivoting Grouped Data with .pivot_table()¶

22. Pivoting Data with pivot_table()¶

23. Central Perk Coffee Talks¶

24. The One with All the Thanksgivings¶

The One Where We Wrap Up¶

Applying Functions to Groups with `.apply()` and `transform()`¶

Pivoting Grouped Data with `.pivot_table()`¶