Statement of Completion#ec1a9e03
Intro to Pandas for Data Analysis
medium
Bartender's Blueprint: Series Operations on Cocktail Concoctions
Resolution
Activities
Introduction¶
In [2]:
# importing necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# load the dataset
df = pd.read_csv('cocktails_recipe.csv')
In [3]:
df.head()
Out[3]:
title | glass | garnish | recipe | ingredients | |
---|---|---|---|---|---|
0 | Abacaxi Ricaço | Pineapple shell (frozen) glass | Cut a straw sized hole in the top of the pinea... | Cut the top off a small pineapple and carefull... | [['1 whole', 'Pineapple (fresh)'], ['9 cl', 'H... |
1 | Abbey | Coupe glass | Orange zest twist | SHAKE all ingredients with ice and fine strain... | [['4.5 cl', 'Rutte Dry Gin'], ['2.25 cl', 'Lil... |
2 | A.B.C. Cocktail | Nick & Nora glass | Lemon zest twist & Luxardo Maraschino cherry | TEAR mint and place in shaker. Add other ingre... | [['7 fresh', 'Mint leaves'], ['3 cl', 'Tawny p... |
3 | Absinthe Cocktail | Coupe glass | Mint leaf | SHAKE all ingredients with ice and fine strain... | [['3 cl', 'La Fée Parisienne absinthe'], ['7.5... |
4 | Absinthe Frappé | Old-fashioned glass | Mint sprig | SHAKE all ingredients with ice and fine strain... | [['4.5 cl', 'La Fée Parisienne absinthe'], ['1... |
In [4]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 6956 entries, 0 to 6955 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 title 6956 non-null object 1 glass 6956 non-null object 2 garnish 6667 non-null object 3 recipe 6955 non-null object 4 ingredients 6956 non-null object dtypes: object(5) memory usage: 271.8+ KB
Warm-Up Activities¶
1. Convert Titles to Lowercase!¶
In [5]:
lowercase_titles = df.title.str.lower()
2. Extract First Word from Each Title¶
In [8]:
first_words = df.title.str.split().str[0]
3. Calculate the Ingredient Length Ratio¶
In [10]:
total_ingredients = df['ingredients'].str.len().sum() # find the total number of characters across all entries in the `ingredients` column.
ingredient_length_ratio = df['ingredients'].str.len() / total_ingredients # Compute the ratio of the length of each ingredient string to the total character count
Just for Exploration¶
Different cocktails call for different glass types, but which ones are the most common? This bar chart showcases the top 10 glass types used in cocktails, revealing the preferences of bartenders and mixologists alike. Let’s explore which glass shapes are the go-to choices for serving a variety of drinks!
In [ ]:
# Bar Chart of Top 10 Glass Types
def plot_top_glass_types(df):
plt.figure(figsize=(12, 6))
top_glasses = df['glass'].value_counts().nlargest(10)
sns.barplot(x=top_glasses.index, y=top_glasses.values)
plt.title('Top 10 Glass Types Used in Cocktails')
plt.xlabel('Glass Type')
plt.ylabel('Count')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
plot_top_glass_types(df)
Activites¶
4. What is the primary advantage of using vectorized operations on a pandas Series over applying a function using a loop?¶
In [ ]:
total_recipes = df['glass'].str.len() # Count the total number of cocktail recipes in the dataset
glass_popularity_ratio = total_recipes /
5. Standardize the Recipe Length.¶
In [12]:
recipe_length = df['ingredients'].str.len()
recipe_length_standardized = (recipe_length - recipe_length.mean()) / recipe_length.std()
recipe_length_standardized
Out[12]:
0 -0.593126 1 -0.244785 2 0.203082 3 -0.825353 4 -0.261373 ... 6951 1.131992 6952 1.364219 6953 1.629622 6954 -0.775590 6955 1.015878 Name: ingredients, Length: 6956, dtype: float64
6. Find the Glass Popularity Ratio¶
In [14]:
total_recipes = len(df)
glass_popularity_ratio = df['glass'].value_counts() / total_recipes
glass_popularity_ratio
Out[14]:
glass Coupe glass 0.317568 Martini glass 0.162593 Old-fashioned glass 0.156987 Collins glass 0.127372 Nick & Nora glass 0.045716 Highball (max 10oz/300ml) 0.029183 Rocks glass 0.020989 Flute glass 0.020558 Shot glass 0.019551 Double old-fashioned 0.013082 Sling glass 0.010351 Wine glass 0.009057 Hurricane glass 0.007907 Toddy glass 0.006901 Goblet glass 0.005463 Fizz or Highball (8oz to 10oz) 0.005319 Tiki mug or collins 0.004313 Martini (large 10oz) glass 0.003882 Copa glass 0.003450 Sour or Martini/Coupette glass 0.003306 Margarita glass 0.003019 Snifter glass 0.002444 Absinthe glass or old-fashioned glass 0.002444 Martini (small) glass 0.002156 Pint glass 0.001725 Julep tin 0.001725 Copper mug or Collins glass 0.001581 Large wine glass 0.001581 Poco grande 0.001294 Pineapple shell (frozen) glass 0.001150 Cup 0.001150 Boston glass 0.001006 Boston & shot glass 0.000575 Jam jar glass 0.000575 Collins or Pineapple shell glass 0.000575 Coconut shell or Collins glass 0.000431 Medical cup or Old-fashioned glass 0.000431 Rum barrel mug or pint glass 0.000431 Copita sherry glass 0.000431 Cartridge mug or Collins glass 0.000288 Cantaritos clay pot or Collins glass 0.000288 Shot and Beer glass 0.000288 Tin 0.000288 Espresso cup 0.000288 Shot & old fashioned glass 0.000144 Belgium beer glass 0.000144 Name: count, dtype: float64
7. Calculate the Garnish Effectiveness Index!¶
In [22]:
garnish_counts = df['garnish'].value_counts() # Count how many times each garnish is used across all cocktails
total_garnishes = garnish_counts.sum() # Calculate the total number of garnishes used
garnish_effectiveness_index = (garnish_counts / total_garnishes) * 100
garnish_effectiveness_index
Out[22]:
garnish Orange zest twist 10.364482 Lemon zest twist 8.774561 Lime wedge 4.499775 Luxardo Maraschino cherry 2.429879 Orange slice 1.979901 ... Lemon zest twist. 0.014999 Lemon zest twist & small strip cured smoked pork on a skewer. 0.014999 Orange zest twist (cut to be cross-shaped) 0.014999 Half Cadbury Creme Egg 0.014999 Orange zest twist & float 3 coffee beans 0.014999 Name: count, Length: 2482, dtype: float64
In [21]:
df['garnish'].value_counts().sum()
Out[21]:
6667
Just for Exploration¶
Which garnishes steal the show in the cocktail world? This pie chart will reveal the top 10 garnishes used in cocktail recipes, giving us a taste of the most popular finishing touches. From fruity slices to aromatic herbs, let’s discover the garnishes that reign supreme in cocktail creations.
In [25]:
#Pie Chart of Top 10 Garnishes
def plot_top_garnishes(df):
plt.figure(figsize=(10, 10))
top_garnishes = df['garnish'].value_counts().nlargest(10)
plt.pie(top_garnishes.values, labels=top_garnishes.index, autopct='%1.1f%%', startangle=90)
plt.title('Top 10 Garnishes Used in Cocktails')
plt.axis('equal')
plt.tight_layout()
plt.show()
plot_top_garnishes(df)
8. Calculate the Ingredient to Garnish Ratio¶
In [33]:
# Calculate the number of ingredients in each cocktail by counting the commas and adding 1
ingredient_count = df['garnish'].str.count(',') + 1
# Calculate the number of garnishes in each cocktail by counting the commas and adding 1
# Handle cases with no garnish (NaN) by filling with -1
garnish_count = df['garnish'].str.count(',').fillna(-1) + 1
In [31]:
Out[31]:
0 1.0 1 1.0 2 1.0 3 1.0 4 1.0 ... 6951 1.0 6952 1.0 6953 1.0 6954 1.0 6955 1.0 Name: garnish, Length: 6956, dtype: float64
In [37]:
# Calculate the number of ingredients in each cocktail by counting the commas and adding 1
ingredient_count = df['ingredients'].str.count(',') + 1
# Calculate the number of garnishes in each cocktail by counting the commas and adding 1
# Handle cases with no garnish (NaN) by filling with -1
garnish_count = df['garnish'].str.count(',').fillna(-1) + 1
# Compute the ratio of the number of ingredients to the number of garnishes for each cocktail
# Add a small value (0.1) to avoid division by zero
ingredient_to_garnish_ratio = ingredient_count / (garnish_count + 0.1)
ingredient_to_garnish_ratio
Out[37]:
0 7.272727 1 7.272727 2 10.000000 3 6.363636 4 8.181818 ... 6951 10.909091 6952 11.818182 6953 12.727273 6954 7.272727 6955 10.909091 Length: 6956, dtype: float64
9. Create the glass_usage_standardized
Series¶
In [43]:
# Count how many times each glass type appears in the dataset
glass_usage = df['glass'].value_counts()
# standardize the counts by centering around the mean and scaling by the standard deviation
glass_usage_standardized = (glass_usage - glass_usage.mean()) / glass_usage.std()
glass_usage_standardized
Out[43]:
glass Coupe glass 5.102995 Martini glass 2.429715 Old-fashioned glass 2.333001 Collins glass 1.822151 Nick & Nora glass 0.413596 Highball (max 10oz/300ml) 0.128413 Rocks glass -0.012938 Flute glass -0.020378 Shot glass -0.037737 Double old-fashioned -0.149330 Sling glass -0.196447 Wine glass -0.218766 Hurricane glass -0.238605 Toddy glass -0.255964 Goblet glass -0.280762 Fizz or Highball (8oz to 10oz) -0.283242 Tiki mug or collins -0.300601 Martini (large 10oz) glass -0.308041 Copa glass -0.315480 Sour or Martini/Coupette glass -0.317960 Margarita glass -0.322920 Snifter glass -0.332839 Absinthe glass or old-fashioned glass -0.332839 Martini (small) glass -0.337799 Pint glass -0.345238 Julep tin -0.345238 Copper mug or Collins glass -0.347718 Large wine glass -0.347718 Poco grande -0.352678 Pineapple shell (frozen) glass -0.355158 Cup -0.355158 Boston glass -0.357638 Boston & shot glass -0.365077 Jam jar glass -0.365077 Collins or Pineapple shell glass -0.365077 Coconut shell or Collins glass -0.367557 Medical cup or Old-fashioned glass -0.367557 Rum barrel mug or pint glass -0.367557 Copita sherry glass -0.367557 Cartridge mug or Collins glass -0.370037 Cantaritos clay pot or Collins glass -0.370037 Shot and Beer glass -0.370037 Tin -0.370037 Espresso cup -0.370037 Shot & old fashioned glass -0.372517 Belgium beer glass -0.372517 Name: count, dtype: float64
In [45]:
# Compute the ingredient intensity index by squaring the length of the ingredient strings for each cocktail
ingredient_intensity_index = df['ingredients'].str.len() ** 2
ingredient_intensity_index
Out[45]:
0 23716 1 30625 2 40804 3 19600 4 30276 ... 6951 66564 6952 73984 6953 82944 6954 20449 6955 63001 Name: ingredients, Length: 6956, dtype: int64
In [ ]:
10. Which of the following operations is NOT
a vectorized operation in pandas?¶
In [40]:
Out[40]:
glass Coupe glass 2209 Martini glass 1131 Old-fashioned glass 1092 Collins glass 886 Nick & Nora glass 318 Highball (max 10oz/300ml) 203 Rocks glass 146 Flute glass 143 Shot glass 136 Double old-fashioned 91 Sling glass 72 Wine glass 63 Hurricane glass 55 Toddy glass 48 Goblet glass 38 Fizz or Highball (8oz to 10oz) 37 Tiki mug or collins 30 Martini (large 10oz) glass 27 Copa glass 24 Sour or Martini/Coupette glass 23 Margarita glass 21 Snifter glass 17 Absinthe glass or old-fashioned glass 17 Martini (small) glass 15 Pint glass 12 Julep tin 12 Copper mug or Collins glass 11 Large wine glass 11 Poco grande 9 Pineapple shell (frozen) glass 8 Cup 8 Boston glass 7 Boston & shot glass 4 Jam jar glass 4 Collins or Pineapple shell glass 4 Coconut shell or Collins glass 3 Medical cup or Old-fashioned glass 3 Rum barrel mug or pint glass 3 Copita sherry glass 3 Cartridge mug or Collins glass 2 Cantaritos clay pot or Collins glass 2 Shot and Beer glass 2 Tin 2 Espresso cup 2 Shot & old fashioned glass 1 Belgium beer glass 1 Name: count, dtype: int64
11. Calculate the ingredient_intensity_index
Series¶
In [ ]:
ingredient_intensity_index = ...
Just for Exploration¶
Ever wondered how the type of glass influences the garnish used in cocktails? This heatmap uncovers the relationship between the top 10 glass types and garnishes, allowing us to explore the intricate combinations that mixologists use. Let’s dive into the world of cocktails and visualize how glassware and garnishes come together to create the perfect drink!
In [ ]:
# Heatmap of Glass Types vs Garnishes
def plot_glass_garnish_heatmap(df):
cross_tab = pd.crosstab(df['glass'], df['garnish'])
# Select top 10 glass types and top 10 garnishes
top_glasses = df['glass'].value_counts().nlargest(10).index
top_garnishes = df['garnish'].value_counts().nlargest(10).index
# Filter the cross-tabulation
heatmap_data = cross_tab.loc[top_glasses, top_garnishes]
plt.figure(figsize=(12, 10))
sns.heatmap(heatmap_data, annot=True, fmt='d', cmap='YlOrRd')
plt.title('Heatmap of Top 10 Glass Types vs Top 10 Garnishes')
plt.xlabel('Garnish')
plt.ylabel('Glass Type')
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()
plot_glass_garnish_heatmap(df)