Statement of Completion#45702d62
Intro to Pandas for Data Analysis
easy
Exploring DataFrames: Uncovering Insights from Top 30 US Fast Food Chains
Resolution
Activities
Project.ipynb
Exploring DataFrames: Uncovering Insights from Top 30 US Fast Food Chains¶
In [1]:
# Importing necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Loading dataset as a dataframe
df = pd.read_csv('Top 30 US fast food chains.csv')
Let's get started¶
In [2]:
# Start by looking at the first 5 records of the dataframe.
df.head()
Out[2]:
Rank | Chain | Sales (U.S., 2017) | # of Locations (U.S.) | |
---|---|---|---|---|
0 | 1 | McDonald's | 37500000000 | 14,036 |
1 | 2 | Starbucks | 13200000000 | 13,930 |
2 | 3 | Subway | 10800000000 | 25,908 |
3 | 4 | Burger King | 9800000000 | 7,226 |
4 | 5 | Taco Bell | 9300000000 | 6,446 |
In [3]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 30 entries, 0 to 29 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Rank 30 non-null int64 1 Chain 30 non-null object 2 Sales (U.S., 2017) 30 non-null int64 3 # of Locations (U.S.) 30 non-null object dtypes: int64(2), object(2) memory usage: 1.1+ KB
In [4]:
df.describe()
Out[4]:
Rank | Sales (U.S., 2017) | |
---|---|---|
count | 30.000000 | 3.000000e+01 |
mean | 15.500000 | 5.740000e+09 |
std | 8.803408 | 6.801095e+09 |
min | 1.000000 | 1.100000e+09 |
25% | 8.250000 | 2.225000e+09 |
50% | 15.500000 | 3.600000e+09 |
75% | 22.750000 | 5.900000e+09 |
max | 30.000000 | 3.750000e+10 |
1. What is the primary difference between the df.info()
and df.describe()
methods in pandas?¶
In [ ]:
2. What is the data type of the Rank
column?¶
In [ ]:
# Find the datatype of the column `Rank`.
3. Which of these columns contain numeric data?¶
In [ ]:
# Find the numeric columns in the dataframe.
4. What is the shape of our DataFrame df
?¶
In [5]:
# Find the shape of the dataframe.
df.shape
Out[5]:
(30, 4)
Diving Deeper¶
5. Select the Sales Column.¶
In [7]:
# Select the sales column.
sales = df['Sales (U.S., 2017)']
Just For Exploration¶
Let's visualise the Sales share of each food chain. Run the cell below to find out!
In [9]:
# Sales Share by Chain Pie Chart
plt.figure(figsize=(10, 10))
plt.pie(df['Sales (U.S., 2017)'], labels=df['Chain'], autopct='%1.1f%%', startangle=140)
plt.title('Market Share of Top 30 US Fast Food Chains in 2017')
plt.show()
The pie chart clearly illustrates that McDonald's leads in sales, capturing a significant portion of the market share. This highlights its popularity and strong brand presence among consumers. 🍔✨
6. Display Top Three Chains¶
In [10]:
df.columns
Out[10]:
Index(['Rank', 'Chain', 'Sales (U.S., 2017)', '# of Locations (U.S.)'], dtype='object')
In [15]:
top_3_chains = df.sort_values(by='Rank')['Chain'].head(3)
In [16]:
top_3_chains
Out[16]:
0 McDonald's 1 Starbucks 2 Subway Name: Chain, dtype: object
7. Identify the 5th Ranking Fast Food Chain¶
In [19]:
df.loc[df['Rank'] == 5]
Out[19]:
Rank | Chain | Sales (U.S., 2017) | # of Locations (U.S.) | |
---|---|---|---|---|
4 | 5 | Taco Bell | 9300000000 | 6,446 |
8. How many chains have more than '5000'
locations?¶
In [20]:
df['# of Locations (U.S.)'] = df['# of Locations (U.S.)'].str.replace(',', '').astype(int)
In [21]:
# find out number of chains having more than 5000 locations
len(df.loc[df['# of Locations (U.S.)'] > 5000])
Out[21]:
9
9. Analyse Sales Distribution Across Food Chains¶
In [22]:
median_sales = df['Sales (U.S., 2017)'].median()
10. If you want to select the third row from the DataFrame df
using positional indexing , which of the following would be correct?¶
In [ ]: