Statement of Completion#8ee7eb03
Intro to Pandas for Data Analysis
easy
Exploring DataFrames: Uncovering Insights from Top 30 US Fast Food Chains
Resolution
Activities
Project.ipynb
Exploring DataFrames: Uncovering Insights from Top 30 US Fast Food Chains¶
In [21]:
# Importing necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Loading dataset as a dataframe
df = pd.read_csv('Top 30 US fast food chains.csv')
¶
Let's get started
In [22]:
# Start by looking at the first 5 records of the dataframe.
df.head()
Out[22]:
Rank | Chain | Sales (U.S., 2017) | # of Locations (U.S.) | |
---|---|---|---|---|
0 | 1 | McDonald's | 37500000000 | 14,036 |
1 | 2 | Starbucks | 13200000000 | 13,930 |
2 | 3 | Subway | 10800000000 | 25,908 |
3 | 4 | Burger King | 9800000000 | 7,226 |
4 | 5 | Taco Bell | 9300000000 | 6,446 |
In [23]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 30 entries, 0 to 29 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Rank 30 non-null int64 1 Chain 30 non-null object 2 Sales (U.S., 2017) 30 non-null int64 3 # of Locations (U.S.) 30 non-null object dtypes: int64(2), object(2) memory usage: 1.1+ KB
In [24]:
df.describe()
Out[24]:
Rank | Sales (U.S., 2017) | |
---|---|---|
count | 30.000000 | 3.000000e+01 |
mean | 15.500000 | 5.740000e+09 |
std | 8.803408 | 6.801095e+09 |
min | 1.000000 | 1.100000e+09 |
25% | 8.250000 | 2.225000e+09 |
50% | 15.500000 | 3.600000e+09 |
75% | 22.750000 | 5.900000e+09 |
max | 30.000000 | 3.750000e+10 |
1. What is the primary difference between the df.info()
and df.describe()
methods in pandas?¶
In [ ]:
2. What is the data type of the Rank
column?¶
In [25]:
df['Rank'].dtype
Out[25]:
dtype('int64')
3. Which of these columns contain numeric data?¶
In [26]:
df.select_dtypes(include=['int64', 'float64']).columns
Out[26]:
Index(['Rank', 'Sales (U.S., 2017)'], dtype='object')
4. What is the shape of our DataFrame df
?¶
In [27]:
df.shape
Out[27]:
(30, 4)
Diving Deeper¶
5. Select the Sales Column.¶
In [29]:
sales = df["Sales (U.S., 2017)"]
sales
Out[29]:
0 37500000000 1 13200000000 2 10800000000 3 9800000000 4 9300000000 5 9300000000 6 5900000000 7 9000000000 8 5900000000 9 5500000000 10 4500000000 11 4500000000 12 4400000000 13 4400000000 14 3600000000 15 3600000000 16 3500000000 17 3500000000 18 3200000000 19 3100000000 20 2300000000 21 2300000000 22 2200000000 23 2100000000 24 2100000000 25 1500000000 26 1400000000 27 1400000000 28 1300000000 29 1100000000 Name: Sales (U.S., 2017), dtype: int64
Just For Exploration¶
Let's visualise the Sales share of each food chain. Run the cell below to find out!
In [ ]:
# Sales Share by Chain Pie Chart
plt.figure(figsize=(10, 10))
plt.pie(df['Sales (U.S., 2017)'], labels=df['Chain'], autopct='%1.1f%%', startangle=140)
plt.title('Market Share of Top 30 US Fast Food Chains in 2017')
plt.show()
The pie chart clearly illustrates that McDonald's leads in sales, capturing a significant portion of the market share. This highlights its popularity and strong brand presence among consumers. 🍔✨
6. Display Top Three Chains¶
In [31]:
top_3_chains = df.head(3)['Chain']
top_3_chains
Out[31]:
0 McDonald's 1 Starbucks 2 Subway Name: Chain, dtype: object
7. Identify the 5th Ranking Fast Food Chain¶
In [33]:
df.loc[4, 'Chain']
Out[33]:
'Taco Bell'
8. How many chains have more than '5000'
locations?¶
In [35]:
df['# of Locations (U.S.)'] = df['# of Locations (U.S.)'].str.replace(',', '').astype(int)
In [36]:
(df['# of Locations (U.S.)'] > 5000).sum()
Out[36]:
9
9. Analyse Sales Distribution Across Food Chains¶
In [38]:
median_sales = df['Sales (U.S., 2017)'].median()
median_sales
Out[38]:
3600000000.0
10. If you want to select the third row from the DataFrame df
using positional indexing , which of the following would be correct?¶
In [ ]: