Abraham SosaG has successfully completed this project.

Intro to Pandas for Data Analysis

medium

4.6

Cocoa Curations: Series Filtering with Chocolate Ratings

Finished

March 18, 2025 10:22 PM

Elapsed time (min)

Completed activities

Resolution

Activities

Project.ipynb

Notebook

Cocoa Curations: Series Filtering with Chocolate Ratings¶

Let's Start¶

In [1]:

# Importing neccessary libraries

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

In [2]:

# Loading the dataset
data = pd.read_csv('flavors_of_cocoa.csv')

In [3]:

data.head()

Out[3]:

	Company \n(Maker-if known)	Specific Bean Origin\nor Bar Name	REF	Review\nDate	Cocoa\nPercent	Company\nLocation	Rating	Broad Bean\nOrigin
0	A. Morin	Agua Grande	1876	2016	63%	France	3.75	Sao Tome
1	A. Morin	Kpime	1676	2015	70%	France	2.75	Togo
2	A. Morin	Atsane	1676	2015	70%	France	3.00	Togo
3	A. Morin	Akata	1680	2015	70%	France	3.50	Togo
4	A. Morin	Quilla	1704	2015	70%	France	3.50	Peru

In [4]:

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1795 entries, 0 to 1794
Data columns (total 9 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Company 
(Maker-if known)         1795 non-null   object 
 1   Specific Bean Origin
or Bar Name  1795 non-null   object 
 2   REF                               1795 non-null   int64  
 3   Review
Date                       1795 non-null   int64  
 4   Cocoa
Percent                     1795 non-null   object 
 5   Company
Location                  1795 non-null   object 
 6   Rating                            1795 non-null   float64
 7   Bean
Type                         1794 non-null   object 
 8   Broad Bean
Origin                 1794 non-null   object 
dtypes: float64(1), int64(2), object(6)
memory usage: 126.3+ KB

In [5]:

# Note the column have `\n` in their names so run the following line to know the column names

data.columns

Out[5]:

Index(['Company \n(Maker-if known)', 'Specific Bean Origin\nor Bar Name',
       'REF', 'Review\nDate', 'Cocoa\nPercent', 'Company\nLocation', 'Rating',
       'Bean\nType', 'Broad Bean\nOrigin'],
      dtype='object')

In [6]:

data.describe()

Out[6]:

	REF	Review\nDate	Rating
count	1795.000000	1795.000000	1795.000000
mean	1035.904735	2012.325348	3.185933
std	552.886365	2.927210	0.478062
min	5.000000	2006.000000	1.000000
25%	576.000000	2010.000000	2.875000
50%	1069.000000	2013.000000	3.250000
75%	1502.000000	2015.000000	3.500000
max	1952.000000	2017.000000	5.000000

1. Which of the following methods is used to find the `top 5` chocolates based on their `Rating`?¶

In [16]:

# Identify the top 5 chocolates based on their rating

In [17]:

data.nsmallest(5, "Rating")

Out[17]:

	Company \n(Maker-if known)	Specific Bean Origin\nor Bar Name	REF	Review\nDate	Cocoa\nPercent	Company\nLocation	Rating	Bean\nType	Broad Bean\nOrigin
326	Callebaut	Baking	141	2007	70%	Belgium	1.0		Ecuador
437	Claudio Corallo	Principe	252	2008	100%	Sao Tome	1.0	Forastero	Sao Tome & Principe
465	Cote d' Or (Kraft)	Sensations Intense	48	2006	70%	Belgium	1.0
1175	Neuhaus (Callebaut)	Dark	135	2007	73%	Belgium	1.0
245	Bonnat	One Hundred	81	2006	100%	France	1.5

2. Identify Low-Rated Chocolate Bars¶

In [21]:

data.loc[data["Rating"] < 2].count()

Out[21]:

Company \n(Maker-if known)           17
Specific Bean Origin\nor Bar Name    17
REF                                  17
Review\nDate                         17
Cocoa\nPercent                       17
Company\nLocation                    17
Rating                               17
Bean\nType                           17
Broad Bean\nOrigin                   17
dtype: int64

In [22]:

low_rated_count = data.loc[data["Rating"] < 2].count()

3. High Cocoa Percent Chocolates¶

In [ ]:

# Converting the 'Cocoa\nPercent' column into a float for filtering
data['Cocoa\nPercent'] = data['Cocoa\nPercent'].str.rstrip('%').astype(float)
high_cocoa_chocolates = ...

In [24]:

data['Cocoa\nPercent'] = data['Cocoa\nPercent'].str.rstrip('%').astype(float)

In [27]:

data.loc[data["Cocoa\nPercent"] > 70]

Out[27]:

	Company \n(Maker-if known)	Specific Bean Origin\nor Bar Name	REF	Review\nDate	Cocoa\nPercent	Company\nLocation	Rating	Bean\nType	Broad Bean\nOrigin
26	Adi	Vanua Levu, Toto-A	705	2011	80.0	Fiji	3.25	Trinitario	Fiji
27	Adi	Vanua Levu	705	2011	88.0	Fiji	3.50	Trinitario	Fiji
28	Adi	Vanua Levu, Ami-Ami-CA	705	2011	72.0	Fiji	3.50	Trinitario	Fiji
32	Akesson's (Pralus)	Bali (west), Sukrama Family, Melaya area	636	2011	75.0	Switzerland	3.75	Trinitario	Indonesia
33	Akesson's (Pralus)	Madagascar, Ambolikapiky P.	502	2010	75.0	Switzerland	2.75	Criollo	Madagascar
...	...	...	...	...	...	...	...	...	...
1778	Zotter	Raw	1205	2014	80.0	Austria	2.75
1779	Zotter	Bocas del Toro, Cocabo Co-op	801	2012	72.0	Austria	3.50		Panama
1784	Zotter	El Oro	879	2012	75.0	Austria	3.00	Forastero (Nacional)	Ecuador
1785	Zotter	Huiwani Coop	879	2012	75.0	Austria	3.00	Criollo, Trinitario	Papua New Guinea
1786	Zotter	El Ceibo Coop	879	2012	90.0	Austria	3.25		Bolivia

795 rows × 9 columns

4. Count Chocolates Above Average Rating¶

In [29]:

mean_rating = data["Rating"].mean()
mean_rating

Out[29]:

3.185933147632312

In [30]:

# calculate mean rating 
mean_rating = data["Rating"].mean()
# Now create series storing count of chocolate bars whose rating is above than the mean rating we calculated
above_avg_chocolates = data.loc[data["Rating"] > mean_rating].count()

Just for Exploration¶

Do you enjoy a bit of a bitter kick in your chocolate? I've heard the more cocoa solids, the more intense the bitterness. But, does a higher cocoa percentage translate into a better rating, or do our taste buds crave something different? Let’s dive in and see how the cocoa percentage really influences expert ratings. You might be surprised by what makes a chocolate bar truly elite!

In [32]:

# Create scatter plot for cocoa percentage vs rating
plt.figure(figsize=(16, 12))
sns.scatterplot(data=data, x='Cocoa\nPercent', y='Rating', hue='Cocoa\nPercent', palette='coolwarm')
plt.title('Cocoa Percentage vs. Chocolate Rating')
plt.xlabel('Cocoa Percent')
plt.ylabel('Rating')
plt.show()

No description has been provided for this image

5. Identify Beans with High Cocoa and High Rating!¶

In [41]:

data.head()

Out[41]:

	Company \n(Maker-if known)	Specific Bean Origin\nor Bar Name	REF	Review\nDate	Cocoa\nPercent	Company\nLocation	Rating	Broad Bean\nOrigin
0	A. Morin	Agua Grande	1876	2016	63.0	France	3.75	Sao Tome
1	A. Morin	Kpime	1676	2015	70.0	France	2.75	Togo
2	A. Morin	Atsane	1676	2015	70.0	France	3.00	Togo
3	A. Morin	Akata	1680	2015	70.0	France	3.50	Togo
4	A. Morin	Quilla	1704	2015	70.0	France	3.50	Peru

In [43]:

filtered_chocolates_series = data.loc[
(data["Cocoa\nPercent"] > 60) & (data["Rating"] >= 4)
]["Specific Bean Origin\nor Bar Name"]

In [44]:

filtered_chocolates_series

Out[44]:

9                    Pablino
17                     Chuao
20      Chanchamayo Province
54                    Morobe
56                    Guayas
                ...         
1687     Porcelana, Pedegral
1693                 Manjari
1699                 Guanaja
1739              Los Llanos
1756                 Ocumare
Name: Specific Bean Origin\nor Bar Name, Length: 99, dtype: object

6. Count Extreme Chocolates¶

In [48]:

extreme_chocolates = data.loc[
(data["Rating"] < 2) | (data["Cocoa\nPercent"] >90)
].count()

In [50]:

extreme_chocolates

Out[50]:

Company \n(Maker-if known)           34
Specific Bean Origin\nor Bar Name    34
REF                                  34
Review\nDate                         34
Cocoa\nPercent                       34
Company\nLocation                    34
Rating                               34
Bean\nType                           34
Broad Bean\nOrigin                   34
dtype: int64

7. What is the correct syntax to filter chocolates with a rating greater than `4.5` and a cocoa percentage less than `70%`?¶

In [ ]:

# Provide the correct syntax to filter chocolates with a rating greater than `4.5` and a cocoa percentage less than `70%`.

8. Count High-Rated Venezuelan Chocolates¶

In [51]:

data.head(1)

Out[51]:

	Company \n(Maker-if known)	Specific Bean Origin\nor Bar Name	REF	Review\nDate	Cocoa\nPercent	Company\nLocation	Rating	Bean\nType	Broad Bean\nOrigin
0	A. Morin	Agua Grande	1876	2016	63.0	France	3.75		Sao Tome

In [52]:

venezuela_chocolates = data.loc[
(data["Broad Bean\nOrigin"]=="Venezuela") & (data["Rating"]>3.5)
].count()

Just for Exploration¶

Ever wondered which country produces the finest chocolate? Is it the lush rainforests of South America, or perhaps the exotic plantations of Africa? This visualization uncovers the countries that consistently produce the highest-rated chocolate bars. You might find some surprising contenders that produce top-quality bars recognized by chocolate connoisseurs worldwide!

In [56]:

# Group by 'Company Location' and calculate the mean rating
top_countries = data.groupby('Company\nLocation')['Rating'].mean().sort_values(ascending=False).head(10)

# Plot the top 10 countries with highest-rated chocolates
plt.figure(figsize=(10, 6))
sns.barplot(x=top_countries.values, y=top_countries.index, palette='viridis')
plt.title('Top 10 Countries Producing Highest-Rated Chocolate Bars')
plt.xlabel('Average Rating')
plt.ylabel('Company Location')
plt.show()

/tmp/ipykernel_17/2659750790.py:6: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(x=top_countries.values, y=top_countries.index, palette='viridis')

9. Recent High-Rated Bars¶

In [57]:

data.head(1)

Out[57]:

	Company \n(Maker-if known)	Specific Bean Origin\nor Bar Name	REF	Review\nDate	Cocoa\nPercent	Company\nLocation	Rating	Bean\nType	Broad Bean\nOrigin
0	A. Morin	Agua Grande	1876	2016	63.0	France	3.75		Sao Tome

In [62]:

recent_high_rated_count = data.loc[
(data["Review\nDate"]>2015) & (data["Rating"]>=4)
].count()

10. Most Common Bean Origin for Highly Rated Chocolates¶

In [76]:

top_rated_common_origin = data.loc[
data["Rating"]>3.5
]['Broad Bean\nOrigin'].mode().iloc[0]

In [77]:

top_rated_common_origin

Out[77]:

'Venezuela'

11. Average Rating by Company Location¶

In [85]:

# Get unique company locations
unique_locations = data["Company\nLocation"].unique()

In [78]:

data.head(1)

Out[78]:

	Company \n(Maker-if known)	Specific Bean Origin\nor Bar Name	REF	Review\nDate	Cocoa\nPercent	Company\nLocation	Rating	Bean\nType	Broad Bean\nOrigin
0	A. Morin	Agua Grande	1876	2016	63.0	France	3.75		Sao Tome

In [84]:

data["Company\nLocation"].value_counts()

Out[84]:

Company\nLocation
U.S.A.               764
France               156
Canada               125
U.K.                  96
Italy                 63
Ecuador               54
Australia             49
Belgium               40
Switzerland           38
Germany               35
Austria               26
Spain                 25
Colombia              23
Hungary               22
Venezuela             20
Peru                  17
New Zealand           17
Madagascar            17
Japan                 17
Brazil                17
Denmark               15
Vietnam               11
Guatemala             10
Scotland              10
Argentina              9
Israel                 9
Costa Rica             9
Poland                 8
Honduras               6
Lithuania              6
South Korea            5
Nicaragua              5
Sweden                 5
Domincan Republic      5
Netherlands            4
Mexico                 4
Puerto Rico            4
Fiji                   4
Sao Tome               4
Amsterdam              4
Ireland                4
South Africa           3
Singapore              3
Iceland                3
Portugal               3
Grenada                3
Finland                2
St. Lucia              2
Chile                  2
Bolivia                2
Wales                  1
Russia                 1
Martinique             1
Czech Republic         1
India                  1
Philippines            1
Ghana                  1
Niacragua              1
Eucador                1
Suriname               1
Name: count, dtype: int64

In [86]:

# Initialize a dictionary to store results
location_stats = {}

# Calculate stats for each location
for location in unique_locations:
    location_data = data[data['Company\nLocation'] == location]
    count = len(location_data)
    if count >= 10 :  # Only consider locations with at least 10 reviews
        mean_rating = location_data["Rating"].mean() # calculate mean rating
        location_stats[location] = mean_rating

In [87]:

location_stats

Out[87]:

{'France': 3.2516025641025643,
 'U.S.A.': 3.1541230366492146,
 'Ecuador': 3.009259259259259,
 'Switzerland': 3.3421052631578947,
 'Spain': 3.27,
 'Peru': 2.8970588235294117,
 'Canada': 3.324,
 'Italy': 3.3253968253968256,
 'Brazil': 3.3970588235294117,
 'U.K.': 3.0546875,
 'Australia': 3.357142857142857,
 'Belgium': 3.09375,
 'Germany': 3.1785714285714284,
 'Venezuela': 3.175,
 'Colombia': 3.1739130434782608,
 'Japan': 3.088235294117647,
 'New Zealand': 3.1911764705882355,
 'Scotland': 3.325,
 'Guatemala': 3.35,
 'Denmark': 3.283333333333333,
 'Vietnam': 3.409090909090909,
 'Madagascar': 3.1470588235294117,
 'Austria': 3.2403846153846154,
 'Hungary': 3.2045454545454546}

In [110]:

# Convert results to a Series and sort
avg_rating_by_location = pd.Series(location_stats).sort_values(ascending=False)

In [109]:

avg_rating_by_location

Out[109]:

Vietnam        3.409091
Brazil         3.397059
Australia      3.357143
Guatemala      3.350000
Switzerland    3.342105
dtype: float64

Statement of Completion#2157942b

Intro to Pandas for Data Analysis

Cocoa Curations: Series Filtering with Chocolate Ratings

Cocoa Curations: Series Filtering with Chocolate Ratings¶

Let's Start¶

1. Which of the following methods is used to find the top 5 chocolates based on their Rating?¶

2. Identify Low-Rated Chocolate Bars¶

3. High Cocoa Percent Chocolates¶

4. Count Chocolates Above Average Rating¶

Just for Exploration¶

5. Identify Beans with High Cocoa and High Rating!¶

6. Count Extreme Chocolates¶

7. What is the correct syntax to filter chocolates with a rating greater than 4.5 and a cocoa percentage less than 70%?¶

8. Count High-Rated Venezuelan Chocolates¶

Just for Exploration¶

9. Recent High-Rated Bars¶

10. Most Common Bean Origin for Highly Rated Chocolates¶

11. Average Rating by Company Location¶

1. Which of the following methods is used to find the `top 5` chocolates based on their `Rating`?¶

7. What is the correct syntax to filter chocolates with a rating greater than `4.5` and a cocoa percentage less than `70%`?¶