Statement of Completion#f725683c
Data Wrangling with Pandas
medium
Unscripted Insights: Data Wrangling with F.R.I.E.N.D.S
Resolution
Activities
Unscripted Insights: Data Wrangling with F.R.I.E.N.D.S¶
Starting Off in the Real World of Data with F.R.I.E.N.D.S!¶
In [1]:
# Importing necessary libraries for data manipulation and visualization
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Setting visual preferences for plotting
plt.style.use('ggplot')
# Loading the datasets
friends_df = pd.read_csv('friends.csv')
friends_info_df = pd.read_csv('friends_info.csv')
In [2]:
friends_df.head()
Out[2]:
text | speaker | season | episode | scene | utterance | |
---|---|---|---|---|---|---|
0 | There's nothing to tell! He's just some guy I ... | Monica Geller | 1 | 1 | 1 | 1 |
1 | C'mon, you're going out with the guy! There's ... | Joey Tribbiani | 1 | 1 | 1 | 2 |
2 | All right Joey, be nice. So does he have a hum... | Chandler Bing | 1 | 1 | 1 | 3 |
3 | Wait, does he eat chalk? | Phoebe Buffay | 1 | 1 | 1 | 4 |
4 | (They all stare, bemused.) | Scene Directions | 1 | 1 | 1 | 5 |
In [3]:
friends_info_df.head()
Out[3]:
season | episode | title | directed_by | written_by | air_date | us_views_millions | imdb_rating | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | The Pilot | James Burrows | David Crane & Marta Kauffman | 1994-09-22 | 21.5 | 8.3 |
1 | 1 | 2 | The One with the Sonogram at the End | James Burrows | David Crane & Marta Kauffman | 1994-09-29 | 20.2 | 8.1 |
2 | 1 | 3 | The One with the Thumb | James Burrows | Jeffrey Astrof & Mike Sikowitz | 1994-10-06 | 19.5 | 8.2 |
3 | 1 | 4 | The One with George Stephanopoulos | James Burrows | Alexa Junge | 1994-10-13 | 19.7 | 8.1 |
4 | 1 | 5 | The One with the East German Laundry Detergent | Pamela Fryman | Jeff Greenstein & Jeff Strauss | 1994-10-20 | 18.6 | 8.5 |
In [4]:
friends_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 67373 entries, 0 to 67372 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 text 67373 non-null object 1 speaker 67097 non-null object 2 season 67373 non-null int64 3 episode 67373 non-null int64 4 scene 67373 non-null int64 5 utterance 67373 non-null int64 dtypes: int64(4), object(2) memory usage: 3.1+ MB
In [5]:
friends_info_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 236 entries, 0 to 235 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 season 236 non-null int64 1 episode 236 non-null int64 2 title 236 non-null object 3 directed_by 236 non-null object 4 written_by 236 non-null object 5 air_date 236 non-null object 6 us_views_millions 236 non-null float64 7 imdb_rating 236 non-null float64 dtypes: float64(2), int64(2), object(4) memory usage: 14.9+ KB
Let's Start!¶
Using Built-in Aggregation Functions¶
1. Who Talks the Most?¶
In [8]:
# Use the 'speaker' and 'text' columns to count the number of dialogues per character
friends_df.groupby('speaker')['text'].count().idxmax()
Out[8]:
'Rachel Green'
2. Seasonal Dialogue Sum¶
In [9]:
friends_df['text'].str.split()
Out[9]:
0 [There's, nothing, to, tell!, He's, just, some... 1 [C'mon,, you're, going, out, with, the, guy!, ... 2 [All, right, Joey,, be, nice., So, does, he, h... 3 [Wait,, does, he, eat, chalk?] 4 [(They, all, stare,, bemused.)] ... 67368 [Oh,, it's, gonna, be, okay.] 67369 [Do, you, guys, have, to, go, to, the, new, ho... 67370 [We, got, some, time.] 67371 [Okay,, should, we, get, some, coffee?] 67372 [Sure., Where?] Name: text, Length: 67373, dtype: object
In [11]:
# First, calculate the word count for each dialogue
friends_df['word_count'] = friends_df['text'].str.split().apply(len)
friends_df['word_count']
Out[11]:
0 11 1 14 2 16 3 5 4 4 .. 67368 5 67369 18 67370 4 67371 6 67372 2 Name: word_count, Length: 67373, dtype: int64
In [12]:
# Now sum these word counts by season and store the result in the variable `seasonal_word_sum`
seasonal_word_sum = friends_df.groupby('season')['word_count'].sum()
seasonal_word_sum
Out[12]:
season 1 65205 2 64129 3 75710 4 74817 5 74405 6 77765 7 73662 8 71555 9 75523 10 55237 Name: word_count, dtype: int64
3. Average Episode Length¶
In [14]:
# Calculate the mean number of unique scenes per episode for each season from the 'scene' column
friends_df.groupby('season')['scene'].nunique().mean()
Out[14]:
19.5
4. Shortest and Longest Dialogues¶
In [15]:
# First, compute the length of each dialogue
friends_df['dialogue_length'] = friends_df['text'].str.len()
friends_df['dialogue_length']
Out[15]:
0 56 1 80 2 72 3 24 4 26 .. 67368 23 67369 77 67370 17 67371 32 67372 12 Name: dialogue_length, Length: 67373, dtype: int64
In [16]:
# Now find the minimum and maximum dialogue length for each character using agg()
dialogue_lengths = friends_df.groupby('speaker')['dialogue_length'].agg(['min', 'max'])
dialogue_lengths
Out[16]:
min | max | |
---|---|---|
speaker | ||
#ALL# | 2 | 73 |
1st Customer | 25 | 25 |
A Casino Boss | 52 | 52 |
A Crew Member | 14 | 16 |
A Disembodied Voice | 15 | 15 |
... | ... | ... |
Woman On Tv | 19 | 80 |
Woman's Voice | 74 | 74 |
Writer | 71 | 71 |
Zack | 5 | 165 |
Zoe | 30 | 30 |
699 rows × 2 columns
5. Comprehensive Character Stats¶
In [18]:
# use all the aggregation function 'mean', 'std', 'min', 'max', 'median'
char_stats = friends_df.groupby('speaker')['dialogue_length'].agg(['mean', 'std', 'min', 'max', 'median'])
char_stats
Out[18]:
mean | std | min | max | median | |
---|---|---|---|---|---|
speaker | |||||
#ALL# | 12.028818 | 11.069892 | 2 | 73 | 8.0 |
1st Customer | 25.000000 | NaN | 25 | 25 | 25.0 |
A Casino Boss | 52.000000 | NaN | 52 | 52 | 52.0 |
A Crew Member | 15.333333 | 1.154701 | 14 | 16 | 16.0 |
A Disembodied Voice | 15.000000 | NaN | 15 | 15 | 15.0 |
... | ... | ... | ... | ... | ... |
Woman On Tv | 41.000000 | 33.867388 | 19 | 80 | 24.0 |
Woman's Voice | 74.000000 | NaN | 74 | 74 | 74.0 |
Writer | 71.000000 | NaN | 71 | 71 | 71.0 |
Zack | 41.300000 | 39.095161 | 5 | 165 | 38.5 |
Zoe | 30.000000 | NaN | 30 | 30 | 30.0 |
699 rows × 5 columns
6. Understanding .groupby() Function¶
In [ ]:
Just for Exploration¶
Which Episode had the hearts of the audience ?¶
F.R.I.E.N.D.S has always been more than just a TV show. Each episode is like catching up with old friends!
As we delve into the data, lets find out which episode truly stood out as the fan favorite? Let's analyze views and IMDb ratings to uncover which storylines and moments captivated viewers the most.
In [20]:
# Calculate a score by combining views and IMDb ratings
friends_info_df ['score'] = friends_info_df ['us_views_millions'] + friends_info_df ['imdb_rating']
# Get the top 10 episodes by score
top_episodes = friends_info_df .sort_values(by='score', ascending=False).head(13)
# Set up the matplotlib figure
plt.figure(figsize=(8, 4))
sns.barplot(x='score', y='title', data=top_episodes, palette='viridis')
# Add titles and labels
plt.title('Top 10 Most Loved Episodes of Friends', fontsize=16)
plt.xlabel('Score (US Views Millions × IMDb Rating)', fontsize=14)
plt.ylabel('Episode Title', fontsize=14)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
# Display the plot
plt.show()
Our analysis leads us to "The Last One," the series finale. This finale beautifully encapsulated the essence of friendship and change, leaving a lasting impression and securing its place as a perennial favorite among the show’s fans.
Defining and Using Custom Aggregation Functions with .agg()¶
7. Custom Aggregation: Unique Words¶
In [21]:
# Define a function to count unique words per character's dialogues
def unique_word_count(texts):
return len(set(' '.join(texts).split()))
In [22]:
# Apply this function to group by dialogue for each speaker and aggregate unique words
unique_words_per_character = friends_df.groupby('speaker')['text'].agg(unique_word_count)
unique_words_per_character
Out[22]:
speaker #ALL# 394 1st Customer 3 A Casino Boss 9 A Crew Member 9 A Disembodied Voice 2 ... Woman On Tv 22 Woman's Voice 13 Writer 12 Zack 119 Zoe 4 Name: text, Length: 699, dtype: int64
8. Phoebe’s Family History¶
In [24]:
def most_frequent_mentions(dialogues):
words = pd.Series(' '.join(dialogues).split())
return words[words.str.lower().isin(['mom', 'dad', 'sister', 'brother', 'family'])].value_counts().idxmax()
In [25]:
# Apply this function to group dialogue by season and aggregate most frequent mentions
family_mentions = friends_df[friends_df['speaker'] == 'Phoebe Buffay'].groupby('season')['text'].agg(most_frequent_mentions)
family_mentions
Out[25]:
season 1 mom 2 dad 3 Mom 4 mom 5 mom 6 sister 7 mom 8 dad 9 family 10 family Name: text, dtype: object
9. Using Aggregation Functions¶
In [ ]:
Grouping by Multiple Columns¶
10. The One Where Joey Speaks¶
In [27]:
# Use the 'speaker', 'text' and 'season' columns to count the number of dialogues by joey
joey_lines = friends_df[friends_df['speaker'] == 'Joey Tribbiani'].groupby('season')['text'].count()
joey_lines
Out[27]:
season 1 640 2 654 3 774 4 838 5 935 6 909 7 933 8 909 9 856 10 723 Name: text, dtype: int64
11. Chandler's Job Mystery¶
In [30]:
# Step 1: Filter rows containing "job" or "work"
job_work_mentions = friends_df[friends_df['text'].str.lower().str.contains('job|work', regex=True)]
In [32]:
# Step 2: Filter rows where speaker is Chandler Bing
chandler_mentions = job_work_mentions[job_work_mentions['speaker'] == 'Chandler Bing']
In [33]:
# Step 3: Group by season
grouped_by_season = chandler_mentions.groupby('season')
In [34]:
# Step 4: Count the number of rows in each group
chandler_job_explanations = grouped_by_season.size()
chandler_job_explanations
Out[34]:
season 1 17 2 17 3 12 4 11 5 14 6 19 7 12 8 17 9 31 10 13 dtype: int64
Just for Exploration¶
The Dynamics of Ross and Rachel’s Relationship¶
Ross and Rachel's relationship is a central storyline in the show. Their conversations show how their relationship changes over time, from beginning to end. By looking at their dialogues across all the seasons, you can see the highs and lows of their connection. Let’s map out their dialogues to understand how their story develops and changes throughout the series.
In [51]:
# Run the below cell to try to understand this relationship!
In [35]:
# Filter dialogues between Ross and Rachel
ross_rachel_dialogues = friends_df[(friends_df['speaker'] == 'Ross Geller') & (friends_df['text'].str.contains('Rachel')) |
(friends_df['speaker'] == 'Rachel Green') & (friends_df['text'].str.contains('Ross'))]
# Group by season and count dialogues
dialogues_per_season = ross_rachel_dialogues.groupby('season').size()
# Plotting
plt.figure(figsize=(10, 6))
dialogues_per_season.plot(kind='line', marker='o', linestyle='-', color='blue')
plt.title('Ross and Rachel Dialogues Per Season')
plt.xlabel('Season')
plt.ylabel('Number of Dialogues')
plt.grid(True)
plt.show()
12. Flashback Flashes¶
In [37]:
# Identify episodes with frequent flashbacks by searching for phrases like 'remember when' and 'back when'.
# Then group and count these instances by season and episode, highlighting how the series revisits its past.
flashbacks = friends_df[friends_df['text'].str.contains('remember when|back when', case = False)]
In [38]:
flashback_mentions = flashbacks.groupby(['season', 'episode']).size()
flashback_mentions
Out[38]:
season episode 1 1 1 2 1 3 1 6 1 13 1 19 1 2 4 1 6 1 12 1 14 1 16 1 24 1 3 3 1 10 1 13 1 15 1 17 1 4 8 1 23 1 5 2 1 5 1 6 1 8 1 10 2 6 6 1 7 2 10 1 17 1 18 1 7 2 1 3 1 6 1 15 1 8 5 1 18 1 9 6 1 24 1 dtype: int64
13. The One with the Catchphrases¶
In [40]:
catchphrase = friends_df[friends_df['text'].str.contains("how you doin'?", case = False)]
In [41]:
# Use groupby() over `speaker` column
joey_catchphrases = catchphrase.groupby('speaker').size()
joey_catchphrases
Out[41]:
speaker Dana Keystone 1 Dr. Franzblau 1 Frank Buffay Jr. 1 Joey Tribbiani 25 Monica Geller 2 Rachel Green 4 Ross Geller 4 Susan Bunch 1 Susie Moss 1 Tag Jones 1 dtype: int64
14. Ross's Weddings¶
In [43]:
# Filter data where text mentions 'wedding' and 'speaker' is Ross Geller
ross_weddings = friends_df[(friends_df['text'].str.contains("wedding", case = False)) & (friends_df['speaker'] == 'Ross Geller')]
In [44]:
# Group ross_weddings by season and episode, and count entries
episode_counts = ross_weddings.groupby(['season', 'episode']).size()
In [45]:
# Find the episode with the most dialogues about weddings
max_wedding_episode = episode_counts.idxmax()
In [46]:
# Print the result
print(f"Ross's wedding episode with the most dialogue is in Season {max_wedding_episode[0]}, Episode {max_wedding_episode[1]} with {episode_counts[max_wedding_episode]} mentions.")
Ross's wedding episode with the most dialogue is in Season 4, Episode 23 with 6 mentions.
Applying Functions to Groups with .apply()
and transform()
¶
15. Filtering with filter()¶
In [ ]:
16. Monica's Cleaning Episodes¶
In [ ]:
In [ ]:
friends_df[friends_df['speaker'] == 'Monica Geller']
In [51]:
friends_df[friends_df['text'].str.contains('clean|dust|soap', case=False)]
Out[51]:
text | speaker | season | episode | scene | utterance | word_count | dialogue_length | |
---|---|---|---|---|---|---|---|---|
30 | No, no don't! Stop cleansing my aura! No, just... | Ross Geller | 1 | 1 | 1 | 31 | 14 | 73 |
141 | I know, I know, I'm such an idiot. I guess I s... | Paul the Wine Guy | 1 | 1 | 5 | 2 | 35 | 165 |
262 | You're welcome. I remember when I first came t... | Phoebe Buffay | 1 | 1 | 14 | 6 | 72 | 390 |
451 | All right, you guys, I kinda gotta clean up now. | Rachel Green | 1 | 2 | 5 | 10 | 10 | 48 |
455 | (Joey turns off the lights, and they all leave... | Scene Directions | 1 | 2 | 5 | 14 | 20 | 108 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
66235 | I mean, this soap opera is a great gig, but...... | Joey Tribbiani | 10 | 14 | 13 | 1 | 36 | 192 |
66588 | I forgot to pick up my dry cleaning! | Joey Tribbiani | 10 | 16 | 4 | 27 | 8 | 36 |
66720 | Well, I was cleaning out the closet and I foun... | Chandler Bing | 10 | 16 | 12 | 18 | 16 | 84 |
66883 | We'll just get him cleaned up a bit. | Nurse | 10 | 17 | 5 | 22 | 8 | 36 |
67110 | Damn, that window is clean. | Phoebe Buffay | 10 | 17 | 12 | 16 | 5 | 27 |
192 rows × 8 columns
In [53]:
# Filter rows where Monica Geller is the speaker and then group by season and sum the occurrences of "clean," "dust," or "soap" in 'text'
cleaning_mentions = friends_df[(friends_df['speaker'] == 'Monica Geller') & (friends_df['text'].str.contains('clean|dust|soap', case=True))].groupby('season')['text'].count()
cleaning_mentions
Out[53]:
season 1 1 2 2 4 3 5 3 6 2 7 1 8 8 9 4 10 4 Name: text, dtype: int64
In [54]:
cleaning_mentions = friends_df[friends_df['speaker'] == 'Monica Geller'].groupby('season')['text'].apply(lambda x: x.str.count('clean|dust|soap').sum())
cleaning_mentions
Out[54]:
season 1 1 2 2 3 0 4 3 5 4 6 2 7 1 8 9 9 6 10 4 Name: text, dtype: int64
Just for Exploration¶
Coffee Time Trends at Central Perk¶
Central Perk is more than just a coffee shop in "F.R.I.E.N.D.S"; it's where countless iconic moments unfolded. Ever wondered how the gang's visits to Central Perk changed over the seasons? Let’s visualize the number of scenes set in Central Perk per season to see how the group's coffee habits evolved.
In [ ]:
# Run the below cell to understand Coffee Time Trends at Central Perk
In [56]:
# Filter for scenes at Central Perk
central_perk_scenes = friends_df[friends_df['text'].str.contains('Central Perk', na=False)]
# Group by season and count the number of scenes
scenes_per_season = central_perk_scenes.groupby('season').size()
# Plotting
plt.figure(figsize=(10, 6))
scenes_per_season.plot(kind='bar', color='brown', alpha=0.7)
plt.title('Number of Central Perk Scenes Per Season')
plt.xlabel('Season')
plt.ylabel('Number of Scenes')
plt.xticks(rotation=0)
plt.show()
17. The One with the Longest Monologue¶
In [ ]:
In [64]:
# Identify the season with the longest dialogue using maximum string length in 'text'
max_dialogue_length = friends_df.groupby('season')['text'].apply(lambda x: x.str.len().idxmax())
max_dialogue_length
Out[64]:
season 1 3700 2 6627 3 14558 4 26127 5 28882 6 35444 7 47565 8 53426 9 57472 10 63950 Name: text, dtype: int64
In [67]:
friends_df.loc[63950, 'text']
Out[67]:
"[Scene: Monica and Chandler's. They are preparing to show Laura around. Laura is standing with her back to the window, Chandler and Monica are standing on either side of her, facing each other. Laura: Well, I must say, this seems like a lovely environment to raise a child in. Monica: Oh, by the way, you are more than welcome to look under any of the furniture, because, believe me, you won't find any porn or cigarettes under there! Laura: Oh! Well, actually, before we look around, let me make sure I have everything I need up to here... (She starts checking her form. Chandler sees movement near the window from the corner of his eye and when he looks he spots Joey climbing up the fire escape and onto their balcony. He warns Monica silently.) Monica: (Pulls Laura into the spare room) Why don't I show you the baby's room? (Joey enters through the side window and jogs towards the kitchen holding a baseball bat) Chandler: What the hell are you doing? Joey: Well, you wouldn't let me in, so I thought you were in trouble. Chandler: Well, we're not. Joey: But you called me 'Bert'!? That's our code word for danger! Chandler: We don't have a code word. Joey: We don't? We really should. From now on, 'Bert' will be our code word for danger. (Monica talks loudly in the baby's room) Monica: So that was the baby's room. (They come out and Chandler throws Joey behind the couch and puts his foot on him. Monica looks at Chandler) Monica: (To Chandler) What room should we see next? Chandler: Any room that isn't behind this couch! (laughs nervously) Monica: (laughs nervously as well, Laura looks confused) (To Laura) Some people don't get him, but I think he's really funny! (She takes Laura to their own bedroom). (Joey gets up and look annoyed) Joey: (quivering with anger) I did not care for that! Chandler: (escorting Joey to the door) You have to get out of here. You slept with our social worker and you never called her back and she is still pissed, so she can't see you. Joey: Ok, ok! (He leaves) Chandler: Ok! (Joey leaves and closes the door behind him. Chandler walks towards the living room, but then Joey enters again.) Chandler: What? Joey: I forgot my bat. (He picks up his bat and holds it up, but then Monica and Laura enter the living room again. When Laura sees Joey, she freezes...) Laura: Oh my God! Chandler: And for the last time, we do not want to be friends with you! And we don't want to buy your bat! (Joey lowers his bat) Laura: What are you doing here? Joey: (to Chandler) Bert! Bert! Bert! Bert! Laura: Are you friends with him? Chandler: I can explain... Joey... Joey: Uhm... ok... uhm... Well, yeah... You have got some nerve, coming back here. I can't believe you never called me. Laura: Excuse me? Joey: Oh... yeah... Probably you don't even remember my name. It's Joey, by the way. And don't bother telling me yours, because I totally remember it... lady. Yeah! I waited weeks for you to call me. Laura: I gave you my number, you never called me. Joey: No, no! Don't try to turn this around on me, ok? I'm not some kind of... social work, ok, that you can just... do. Laura: (embarrassed towards Chandler and Monica) Well, I'm pretty sure I gave you my number. Joey: Really? Think about it. Come on! You're a beautiful woman, smart, funny, we had a really good time, huh? If I had your number, why wouldn't I call you? Laura: I don't know... Well, maybe I'm wrong... I'm sorry... Joey: No, no, hey, no! Too late for apologies... ok? You broke my heart. You know how many women I had to sleep with to get over you? (and he leaves the apartment, leaving her shocked) Laura: Joey, wait! Joey: (acting sad) NO! I waited a long time, I can't wait anymore... (and closes the door behind him) Laura: (laughing nervously) I'm sorry that you had to see that. I'm so embarrassed... Chandler: Oh, that's really ok. Monica: Yeah, that we totally understand. Dating is hard. Laura: Boy, you people are nice... And I've got to say... I think you're going to make excellent parents. (Chandler and Monica hug each other, and then Joey enters the apartment again.) Joey: LAURA! (and points to her, very confident)[Scene: The New York City Children's Fund building. Phoebe and Mike are entering.]"
18. The One with the Routine¶
In [68]:
friends_info_df.head()
Out[68]:
season | episode | title | directed_by | written_by | air_date | us_views_millions | imdb_rating | score | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | The Pilot | James Burrows | David Crane & Marta Kauffman | 1994-09-22 | 21.5 | 8.3 | 29.8 |
1 | 1 | 2 | The One with the Sonogram at the End | James Burrows | David Crane & Marta Kauffman | 1994-09-29 | 20.2 | 8.1 | 28.3 |
2 | 1 | 3 | The One with the Thumb | James Burrows | Jeffrey Astrof & Mike Sikowitz | 1994-10-06 | 19.5 | 8.2 | 27.7 |
3 | 1 | 4 | The One with George Stephanopoulos | James Burrows | Alexa Junge | 1994-10-13 | 19.7 | 8.1 | 27.8 |
4 | 1 | 5 | The One with the East German Laundry Detergent | Pamela Fryman | Jeff Greenstein & Jeff Strauss | 1994-10-20 | 18.6 | 8.5 | 27.1 |
In [73]:
# Find seasons with the most mentions of 'Routine' or 'Dance' in the title and count dialogues
# Use the friends_info_df!
dance_music_dialogues = friends_info_df.groupby('season')['title'].apply(lambda x: x.str.contains('Routine|Dance', case=False).count())
dance_music_dialogues
dance_music_dialogues= friends_info_df[friends_info_df['title'].apply(lambda x: 'Routine' in x or 'Dance' in x)].groupby('season').size()
dance_music_dialogues
Out[73]:
season 6 1 dtype: int64
19. Chandler in a Box¶
In [80]:
friends_df.head()
Out[80]:
text | speaker | season | episode | scene | utterance | word_count | dialogue_length | |
---|---|---|---|---|---|---|---|---|
0 | There's nothing to tell! He's just some guy I ... | Monica Geller | 1 | 1 | 1 | 1 | 11 | 56 |
1 | C'mon, you're going out with the guy! There's ... | Joey Tribbiani | 1 | 1 | 1 | 2 | 14 | 80 |
2 | All right Joey, be nice. So does he have a hum... | Chandler Bing | 1 | 1 | 1 | 3 | 16 | 72 |
3 | Wait, does he eat chalk? | Phoebe Buffay | 1 | 1 | 1 | 4 | 5 | 24 |
4 | (They all stare, bemused.) | Scene Directions | 1 | 1 | 1 | 5 | 4 | 26 |
In [100]:
len(['Hey', '', "it's", 'me.', 'I', 'know', 'you', "can't", 'stand'])
Out[100]:
9
In [77]:
# First lets filter out the episode using the `title` column of the `friends_info_df` dataset.
# Episode : 'The One with Chandler in a Box'
chandler_box = friends_info_df[friends_info_df['title'] == 'The One with Chandler in a Box']
chandler_box
Out[77]:
season | episode | title | directed_by | written_by | air_date | us_views_millions | imdb_rating | score | |
---|---|---|---|---|---|---|---|---|---|
80 | 4 | 8 | The One with Chandler in a Box | Peter Bonerz | Michael Borkow | 1997-11-20 | 26.8 | 9.1 | 35.9 |
In [107]:
# Calculate the average number of words spoken by each character in 'The One with Chandler in a Box' episode by grouping the dialogue by speaker and applying a function to count words per dialogue.
# Hint : Use the season and episode number obtained from above in the `friends_df' dataset.
avg_words_per_character = friends_df[(friends_df['season'] == 4) & (friends_df['episode'] == 8)].groupby('speaker')['text'].apply(lambda x: x.str.split().apply(len).mean())
avg_words_per_character
Out[107]:
speaker Chandler Bing 13.022727 Doctor 8.000000 Gunther 13.000000 Joey Tribbiani 10.304348 Kathy 15.000000 Monica Geller 8.208333 Nurse 11.000000 Phoebe Buffay 12.208333 Rachel Green 9.142857 Ross Geller 10.323529 Scene Directions 9.611111 Timothy Burke 6.250000 Voice 14.500000 Name: text, dtype: float64
20. Transformations with .transform()¶
In [ ]:
21. Dialogue Transformation¶
In [113]:
friends_df.groupby('speaker')['dialogue_length'].transform(lambda x: (x - x.mean()) / x.std())
Out[113]:
0 0.115586 1 0.465502 2 0.374232 3 -0.547378 4 -0.550232 ... 67368 -0.568622 67369 0.410054 67370 -0.665936 67371 -0.366175 67372 -0.780283 Name: dialogue_length, Length: 67373, dtype: float64
In [112]:
friends_df.groupby('speaker')['dialogue_length']
Out[112]:
<pandas.core.groupby.generic.SeriesGroupBy object at 0x73484242f090>
In [115]:
# Now apply the `transform` method using the `dialogue_length`
friends_df['normalized_length'] = friends_df.groupby('speaker')['dialogue_length'].transform(lambda x: (x - x.mean()) / x.std())
Just for Exploration¶
Dynamics of the Main Six Characters¶
The six main characters of "F.R.I.E.N.D.S" form the heart of the show. Observing how their dialogue contributions vary season by season can provide fans with insights into character development and storyline emphasis. Let’s use a stacked bar chart to visualize each character's dialogue counts per season, illustrating their prominence and interaction within the group.
In [ ]:
# Run the below cell to understand Dynamics of the Main Six Characters
In [117]:
# Filter out the main six characters
main_characters = ['Rachel Green', 'Ross Geller', 'Monica Geller', 'Chandler Bing', 'Joey Tribbiani', 'Phoebe Buffay']
filtered_df = friends_df[friends_df['speaker'].isin(main_characters)]
# Group by season and speaker, then count dialogues
dialogues_per_season = filtered_df.groupby(['season', 'speaker']).size().unstack()
# Define a color palette for the characters
colors = ['#6A0DAD', '#7B2CBF', '#9D4EDD', '#C77DFF', '#D6BCFA', '#E6E6FA'] # Soft pastel colors
# Plotting using a stacked bar chart
dialogues_per_season.plot(kind='bar', stacked=True, figsize=(12, 8), color=colors)
plt.title('Dialogue Contributions of Main Characters Per Season')
plt.xlabel('Season')
plt.ylabel('Number of Dialogues')
plt.legend(title='Character', labels=main_characters)
plt.xticks(rotation=0)
plt.show()
Pivoting Grouped Data with .pivot_table()
¶
22. Pivoting Data with pivot_table()¶
In [ ]:
23. Central Perk Coffee Talks¶
In [119]:
friends_df[friends_df['text'].str.contains('Central Perk', na=False)]
Out[119]:
text | speaker | season | episode | scene | utterance | word_count | dialogue_length | normalized_length | |
---|---|---|---|---|---|---|---|---|---|
1445 | (A flashback of Aurora and Chandler on their d... | Scene Directions | 1 | 6 | 3 | 7 | 16 | 89 | 0.073443 |
1631 | Everybody? Shh, shhh. Uhhh... Central Perk is ... | Rachel Green | 1 | 7 | 1 | 1 | 16 | 95 | 0.720546 |
2874 | Central Perk is proud to present Miss Phoebe B... | Rachel Green | 1 | 11 | 9 | 19 | 9 | 52 | -0.021184 |
3100 | [Cut back to Central Perk.] | Scene Directions | 1 | 12 | 8 | 3 | 5 | 27 | -0.540332 |
3105 | [Cut back to Central Perk.] | Scene Directions | 1 | 12 | 8 | 8 | 5 | 27 | -0.540332 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
66185 | [Scene: Central Perk. Phoebe and Mike are leav... | Scene Directions | 10 | 14 | 11 | 0 | 8 | 51 | -0.302742 |
66279 | [Scene: Central Perk. Phoebe's reading a newsp... | Scene Directions | 10 | 15 | 2 | 0 | 12 | 81 | -0.005754 |
66324 | [Scene: Central Perk. Phoebe's reading, Joey h... | Scene Directions | 10 | 15 | 4 | 0 | 11 | 71 | -0.104750 |
66928 | [Scene: Central Perk. Ross, Phoebe and Joey ar... | Scene Directions | 10 | 17 | 7 | 0 | 9 | 55 | -0.263144 |
67074 | [Scene: The street right in front of Central P... | Scene Directions | 10 | 17 | 10 | 0 | 21 | 112 | 0.301134 |
499 rows × 9 columns
In [134]:
# Filter for dialogues occurring in Central Perk
central_perk_talks = friends_df[friends_df['text'].str.contains('Central Perk', case = False)]
In [135]:
# Create a pivot table to count the number of dialogues each character has in Central Perk, grouped by season.
# Use 'season' as the index, 'speaker' as the columns, and 'text' for counting dialogues, with missing values as 0.
central_perk_pivot = central_perk_talks.pivot_table(values='text', index= 'season', columns='speaker', aggfunc='count', fill_value=0)
central_perk_pivot
Out[135]:
speaker | Chandler Bing | Phoebe Buffay | Rachel Green | Ross Geller | Scene Directions |
---|---|---|---|---|---|
season | |||||
1 | 0 | 0 | 2 | 0 | 3 |
2 | 1 | 2 | 1 | 0 | 50 |
3 | 0 | 0 | 0 | 0 | 64 |
4 | 1 | 0 | 0 | 0 | 55 |
5 | 0 | 0 | 0 | 1 | 53 |
6 | 0 | 0 | 0 | 0 | 75 |
7 | 0 | 0 | 0 | 0 | 62 |
8 | 0 | 0 | 0 | 0 | 46 |
9 | 0 | 0 | 0 | 0 | 43 |
10 | 0 | 0 | 0 | 0 | 40 |
In [ ]:
In [122]:
#friends_info_df.pivot_table(values='us_views_millions', index=['season', 'episode'], columns='directed_by', aggfunc='sum')
24. The One with All the Thanksgivings¶
In [137]:
# Filter dialogues mentioning "Thanksgiving"
thanksgiving_dialogues = friends_df[friends_df['text'].str.contains('Thanksgiving', case = False)]
In [140]:
# Create a pivot table to count the number of dialogues each main character has in Thanksgiving episodes, grouped by season.
# Use 'season' as the index, 'speaker' as the columns, and 'text' for counting dialogues, with missing values as 0.
thanksgiving_pivot = thanksgiving_dialogues.pivot_table(values = 'text', index = 'season', columns = 'speaker', aggfunc='count', fill_value = 0)
thanksgiving_pivot
Out[140]:
speaker | Amy Green | Chandler Bing | Janine Lecroix | Joey Tribbiani | Judy Geller | Monica Geller | Mr. Ratstatter | Phoebe Buffay | Rachel Green | Ross Geller | Scene Directions | Tag Jones | The Girls | Timothy Burke | Will Colbert |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
season | |||||||||||||||
1 | 0 | 4 | 0 | 1 | 0 | 4 | 0 | 3 | 2 | 1 | 0 | 0 | 0 | 0 | 0 |
2 | 0 | 0 | 0 | 0 | 0 | 1 | 3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 2 | 0 | 0 | 0 | 0 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 2 | 0 | 0 | 1 | 0 |
5 | 0 | 3 | 0 | 1 | 1 | 3 | 0 | 0 | 4 | 6 | 11 | 0 | 0 | 0 | 0 |
6 | 0 | 1 | 2 | 1 | 0 | 1 | 0 | 2 | 1 | 5 | 2 | 0 | 1 | 0 | 0 |
7 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 1 | 2 | 2 | 1 | 0 | 0 | 0 |
8 | 0 | 0 | 0 | 2 | 0 | 2 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
9 | 2 | 0 | 0 | 2 | 0 | 0 | 0 | 1 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
10 | 0 | 3 | 0 | 3 | 0 | 2 | 0 | 3 | 5 | 2 | 0 | 0 | 0 | 0 | 0 |
The One Where We Wrap Up¶
In [ ]:
In [ ]: