Md Harun Or Roshid has successfully completed this project.

Intro to Pandas for Data Analysis

medium

4.61

Cocoa Curations: Series Filtering with Chocolate Ratings

Finished

January 19, 2025 4:53 PM

Elapsed time (min)

Completed activities

Resolution

Activities

Cocoa Curations: Series Filtering with Chocolate Ratings¶

Let's Start¶

In [2]:

# Importing neccessary libraries

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

In [3]:

# Loading the dataset
data = pd.read_csv('flavors_of_cocoa.csv')

In [4]:

data.head()

Out[4]:

	Company \n(Maker-if known)	Specific Bean Origin\nor Bar Name	REF	Review\nDate	Cocoa\nPercent	Company\nLocation	Rating	Broad Bean\nOrigin
0	A. Morin	Agua Grande	1876	2016	63%	France	3.75	Sao Tome
1	A. Morin	Kpime	1676	2015	70%	France	2.75	Togo
2	A. Morin	Atsane	1676	2015	70%	France	3.00	Togo
3	A. Morin	Akata	1680	2015	70%	France	3.50	Togo
4	A. Morin	Quilla	1704	2015	70%	France	3.50	Peru

In [5]:

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1795 entries, 0 to 1794
Data columns (total 9 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Company 
(Maker-if known)         1795 non-null   object 
 1   Specific Bean Origin
or Bar Name  1795 non-null   object 
 2   REF                               1795 non-null   int64  
 3   Review
Date                       1795 non-null   int64  
 4   Cocoa
Percent                     1795 non-null   object 
 5   Company
Location                  1795 non-null   object 
 6   Rating                            1795 non-null   float64
 7   Bean
Type                         1794 non-null   object 
 8   Broad Bean
Origin                 1794 non-null   object 
dtypes: float64(1), int64(2), object(6)
memory usage: 126.3+ KB

In [6]:

# Note the column have `\n` in their names so run the following line to know the column names

data.columns

Out[6]:

Index(['Company \n(Maker-if known)', 'Specific Bean Origin\nor Bar Name',
       'REF', 'Review\nDate', 'Cocoa\nPercent', 'Company\nLocation', 'Rating',
       'Bean\nType', 'Broad Bean\nOrigin'],
      dtype='object')

In [7]:

data.describe()

Out[7]:

	REF	Review\nDate	Rating
count	1795.000000	1795.000000	1795.000000
mean	1035.904735	2012.325348	3.185933
std	552.886365	2.927210	0.478062
min	5.000000	2006.000000	1.000000
25%	576.000000	2010.000000	2.875000
50%	1069.000000	2013.000000	3.250000
75%	1502.000000	2015.000000	3.500000
max	1952.000000	2017.000000	5.000000

1. Which of the following methods is used to find the `top 5` chocolates based on their `Rating`?¶

In [13]:

# Identify the top 5 chocolates based on their rating
data.nlargest(8, 'Rating')

Out[13]:

	Company \n(Maker-if known)	Specific Bean Origin\nor Bar Name	REF	Review\nDate	Cocoa\nPercent	Company\nLocation	Rating	Bean\nType	Broad Bean\nOrigin
78	Amedei	Chuao	111	2007	70%	Italy	5.0	Trinitario	Venezuela
86	Amedei	Toscano Black	40	2006	70%	Italy	5.0	Blend
9	A. Morin	Pablino	1319	2014	70%	France	4.0		Peru
17	A. Morin	Chuao	1015	2013	70%	France	4.0	Trinitario	Venezuela
20	A. Morin	Chanchamayo Province	1019	2013	63%	France	4.0		Peru
54	Amano	Morobe	725	2011	70%	U.S.A.	4.0		Papua New Guinea
56	Amano	Guayas	470	2010	70%	U.S.A.	4.0		Ecuador
76	Amedei	Porcelana	111	2007	70%	Italy	4.0	Criollo (Porcelana)	Venezuela

2. Identify Low-Rated Chocolate Bars¶

In [29]:

low_rated_count = data[data['Rating'] <2].count()

In [36]:

data['Cocoa\nPercent'] = data['Cocoa\nPercent'].str.rstrip('%').astype(float)

In [38]:

high_cocoa_chocolates =data[data['Cocoa\nPercent'] >70]

In [39]:

high_cocoa_chocolates

Out[39]:

	Company \n(Maker-if known)	Specific Bean Origin\nor Bar Name	REF	Review\nDate	Cocoa\nPercent	Company\nLocation	Rating	Bean\nType	Broad Bean\nOrigin
26	Adi	Vanua Levu, Toto-A	705	2011	80.0	Fiji	3.25	Trinitario	Fiji
27	Adi	Vanua Levu	705	2011	88.0	Fiji	3.50	Trinitario	Fiji
28	Adi	Vanua Levu, Ami-Ami-CA	705	2011	72.0	Fiji	3.50	Trinitario	Fiji
32	Akesson's (Pralus)	Bali (west), Sukrama Family, Melaya area	636	2011	75.0	Switzerland	3.75	Trinitario	Indonesia
33	Akesson's (Pralus)	Madagascar, Ambolikapiky P.	502	2010	75.0	Switzerland	2.75	Criollo	Madagascar
...	...	...	...	...	...	...	...	...	...
1778	Zotter	Raw	1205	2014	80.0	Austria	2.75
1779	Zotter	Bocas del Toro, Cocabo Co-op	801	2012	72.0	Austria	3.50		Panama
1784	Zotter	El Oro	879	2012	75.0	Austria	3.00	Forastero (Nacional)	Ecuador
1785	Zotter	Huiwani Coop	879	2012	75.0	Austria	3.00	Criollo, Trinitario	Papua New Guinea
1786	Zotter	El Ceibo Coop	879	2012	90.0	Austria	3.25		Bolivia

795 rows × 9 columns

3. High Cocoa Percent Chocolates¶

In [ ]:

# Converting the 'Cocoa\nPercent' column into a float for filtering
data['Cocoa\nPercent'] = data['Cocoa\nPercent'].str.rstrip('%').astype(float)
high_cocoa_chocolates = ...

4. Count Chocolates Above Average Rating¶

In [41]:

# calculate mean rating 
mean_rating = data['Rating'].mean()

In [54]:

# Now create series storing count of chocolate bars whose rating is above than the mean rating we calculated
above_avg_chocolates = data[data['Rating'] > mean_rating].count().sort_values(ascending=True)

In [46]:

above_avg_chocolates

Out[46]:

Company \n(Maker-if known)           1005
Specific Bean Origin\nor Bar Name    1005
REF                                  1005
Review\nDate                         1005
Cocoa\nPercent                       1005
Company\nLocation                    1005
Rating                               1005
Bean\nType                           1004
Broad Bean\nOrigin                   1005
dtype: int64

Just for Exploration¶

Do you enjoy a bit of a bitter kick in your chocolate? I've heard the more cocoa solids, the more intense the bitterness. But, does a higher cocoa percentage translate into a better rating, or do our taste buds crave something different? Let’s dive in and see how the cocoa percentage really influences expert ratings. You might be surprised by what makes a chocolate bar truly elite!

In [58]:

# Create scatter plot for cocoa percentage vs rating
plt.figure(figsize=(16, 12))
sns.scatterplot(data=data, x='Cocoa\nPercent', y='Rating', hue='Cocoa\nPercent', palette='coolwarm')
plt.title('Cocoa Percentage vs. Chocolate Rating')
plt.xlabel('Cocoa Percent')
plt.ylabel('Rating')
plt.show()

No description has been provided for this image

5. Identify Beans with High Cocoa and High Rating!¶

In [137]:

filtered_chocolates_series = data[(data['Cocoa\nPercent'] > 60) & (data['Rating'] >= 4.0)]['Specific Bean Origin\nor Bar Name']

In [70]:

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1795 entries, 0 to 1794
Data columns (total 9 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Company 
(Maker-if known)         1795 non-null   object 
 1   Specific Bean Origin
or Bar Name  1795 non-null   object 
 2   REF                               1795 non-null   int64  
 3   Review
Date                       1795 non-null   int64  
 4   Cocoa
Percent                     1795 non-null   float64
 5   Company
Location                  1795 non-null   object 
 6   Rating                            1795 non-null   float64
 7   Bean
Type                         1794 non-null   object 
 8   Broad Bean
Origin                 1794 non-null   object 
dtypes: float64(2), int64(2), object(5)
memory usage: 126.3+ KB

In [74]:

data[(data['Cocoa\nPercent'] > 60) & (data['Rating'] >= 4.0)]['Specific Bean Origin\nor Bar Name']

Out[74]:

9                    Pablino
17                     Chuao
20      Chanchamayo Province
54                    Morobe
56                    Guayas
                ...         
1687     Porcelana, Pedegral
1693                 Manjari
1699                 Guanaja
1739              Los Llanos
1756                 Ocumare
Name: Specific Bean Origin\nor Bar Name, Length: 99, dtype: object

In [63]:

data

Out[63]:

	Company \n(Maker-if known)	Specific Bean Origin\nor Bar Name	REF	Review\nDate	Cocoa\nPercent	Company\nLocation	Rating	Bean\nType	Broad Bean\nOrigin
0	A. Morin	Agua Grande	1876	2016	63.0	France	3.75		Sao Tome
1	A. Morin	Kpime	1676	2015	70.0	France	2.75		Togo
2	A. Morin	Atsane	1676	2015	70.0	France	3.00		Togo
3	A. Morin	Akata	1680	2015	70.0	France	3.50		Togo
4	A. Morin	Quilla	1704	2015	70.0	France	3.50		Peru
...	...	...	...	...	...	...	...	...	...
1790	Zotter	Peru	647	2011	70.0	Austria	3.75		Peru
1791	Zotter	Congo	749	2011	65.0	Austria	3.00	Forastero	Congo
1792	Zotter	Kerala State	749	2011	65.0	Austria	3.50	Forastero	India
1793	Zotter	Kerala State	781	2011	62.0	Austria	3.25		India
1794	Zotter	Brazil, Mitzi Blue	486	2010	65.0	Austria	3.00		Brazil

1795 rows × 9 columns

6. Count Extreme Chocolates¶

In [81]:

extreme_chocolates = data[(data['Rating'] < 2) | (data['Cocoa\nPercent'] >90)].count()

In [80]:

Out[80]:

Company \n(Maker-if known)           34
Specific Bean Origin\nor Bar Name    34
REF                                  34
Review\nDate                         34
Cocoa\nPercent                       34
Company\nLocation                    34
Rating                               34
Bean\nType                           34
Broad Bean\nOrigin                   34
dtype: int64

7. What is the correct syntax to filter chocolates with a rating greater than `4.5` and a cocoa percentage less than `70%`?¶

In [90]:

# Provide the correct syntax to filter chocolates with a rating greater than `4.5` and a cocoa percentage less than `70%`.
data[(data['Rating'] > 4.5) & (data['Cocoa\nPercent'] < 70)].count()

Out[90]:

Company \n(Maker-if known)           0
Specific Bean Origin\nor Bar Name    0
REF                                  0
Review\nDate                         0
Cocoa\nPercent                       0
Company\nLocation                    0
Rating                               0
Bean\nType                           0
Broad Bean\nOrigin                   0
dtype: int64

8. Count High-Rated Venezuelan Chocolates¶

In [98]:

venezuela_chocolates = data[(data['Broad Bean\nOrigin'] == 'Venezuela') & (data['Rating'] > 3.5)].count()

In [99]:

venezuela_chocolates

Out[99]:

Company \n(Maker-if known)           54
Specific Bean Origin\nor Bar Name    54
REF                                  54
Review\nDate                         54
Cocoa\nPercent                       54
Company\nLocation                    54
Rating                               54
Bean\nType                           54
Broad Bean\nOrigin                   54
dtype: int64

Just for Exploration¶

Ever wondered which country produces the finest chocolate? Is it the lush rainforests of South America, or perhaps the exotic plantations of Africa? This visualization uncovers the countries that consistently produce the highest-rated chocolate bars. You might find some surprising contenders that produce top-quality bars recognized by chocolate connoisseurs worldwide!

In [103]:

# Group by 'Company Location' and calculate the mean rating
top_countries = data.groupby('Company\nLocation')['Rating'].mean().sort_values(ascending=False).head(10)

# Plot the top 10 countries with highest-rated chocolates
plt.figure(figsize=(10, 6))
sns.barplot(x=top_countries.values, y=top_countries.index, palette='viridis')
plt.title('Top 10 Countries Producing Highest-Rated Chocolate Bars')
plt.xlabel('Average Rating')
plt.ylabel('Company Location')
plt.show()

/tmp/ipykernel_21/2659750790.py:6: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(x=top_countries.values, y=top_countries.index, palette='viridis')

9. Recent High-Rated Bars¶

In [110]:

recent_high_rated_count = ...

In [116]:

recent_high_rated_count = data[(data['Review\nDate'] >2015) & (data['Rating'] >= 4)].count()

In [109]:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[109], line 1
----> 1 data[(data['Review\nDate' >2015]) & (data['Rating'] >=4)].count()

TypeError: '>' not supported between instances of 'str' and 'int'

10. Most Common Bean Origin for Highly Rated Chocolates¶

In [118]:

top_rated_common_origin = ...

In [122]:

data['Broad Bean\nOrigin'].mode()

Out[122]:

0    Venezuela
Name: Broad Bean\nOrigin, dtype: object

In [126]:

data[data['Rating'] > 3.5]['Broad Bean\nOrigin'].mode().iloc[0]

Out[126]:

'Venezuela'

In [ ]:

11. Average Rating by Company Location¶

In [130]:

# Get unique company locations
unique_locations = data['Company\nLocation'].unique()

In [131]:

# Initialize a dictionary to store results
location_stats = {}

# Calculate stats for each location
for location in unique_locations:
    location_data = data[data['Company\nLocation'] == location]
    count = len(location_data)
    if count >= 10:  # Only consider locations with at least 10 reviews
        mean_rating = location_data['Rating'].mean() # calculate mean rating
        location_stats[location] = mean_rating

In [133]:

# Convert results to a Series and sort
avg_rating_by_location = pd.Series(location_stats).sort_values(ascending=False)
avg_rating_by_location

Out[133]:

Vietnam        3.409091
Brazil         3.397059
Australia      3.357143
Guatemala      3.350000
Switzerland    3.342105
Italy          3.325397
Scotland       3.325000
Canada         3.324000
Denmark        3.283333
Spain          3.270000
France         3.251603
Austria        3.240385
Hungary        3.204545
New Zealand    3.191176
Germany        3.178571
Venezuela      3.175000
Colombia       3.173913
U.S.A.         3.154123
Madagascar     3.147059
Belgium        3.093750
Japan          3.088235
U.K.           3.054688
Ecuador        3.009259
Peru           2.897059
dtype: float64

Statement of Completion#c6fb4b2e

Intro to Pandas for Data Analysis

Cocoa Curations: Series Filtering with Chocolate Ratings

Cocoa Curations: Series Filtering with Chocolate Ratings¶

Let's Start¶

1. Which of the following methods is used to find the top 5 chocolates based on their Rating?¶

2. Identify Low-Rated Chocolate Bars¶

3. High Cocoa Percent Chocolates¶

4. Count Chocolates Above Average Rating¶

Just for Exploration¶

5. Identify Beans with High Cocoa and High Rating!¶

6. Count Extreme Chocolates¶

7. What is the correct syntax to filter chocolates with a rating greater than 4.5 and a cocoa percentage less than 70%?¶

8. Count High-Rated Venezuelan Chocolates¶

Just for Exploration¶

9. Recent High-Rated Bars¶

10. Most Common Bean Origin for Highly Rated Chocolates¶

11. Average Rating by Company Location¶

1. Which of the following methods is used to find the `top 5` chocolates based on their `Rating`?¶

7. What is the correct syntax to filter chocolates with a rating greater than `4.5` and a cocoa percentage less than `70%`?¶