Statement of Completion#6684b5cb
Introduction to Descriptive Statistics
easy
Measures of Central Tendency
Resolution
Activities
Measures of central tendency¶
First, import the Importing required libraries
In [31]:
import numpy as np
import pandas as pd
from collections import Counter
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Let’s show these by calculating the mean, median, and mode. Suppose we weight of luggage presented by airline passengers at the check-in (measured to the nearest kg):
In [2]:
# Define a list of numbers
data = [10, 12, 15, 17, 20, 20, 22, 25, 30, 35]
# Calculate the mean
mean = sum(data) / len(data)
print("Mean:", mean)
# Calculate the median
sorted_data = sorted(data)
n = len(sorted_data)
if n % 2 == 0:
median = (sorted_data[n // 2 - 1] + sorted_data[n // 2]) / 2
else:
median = sorted_data[n // 2]
print("Median:", median)
# Calculate the mode
c = Counter(data)
mode = c.most_common(1)[0][0]
print("Mode:", mode)
Mean: 20.6 Median: 20.0 Mode: 20
We could use numpy and scipy to compute these measures
In [3]:
import numpy as np
from scipy import stats
mean_=np.mean(data)
median_=np.median(data)
mode_=stats.mode(data)
print('mean',mean_,'median',median_,'mode',mode_)
mean 20.6 median 20.0 mode ModeResult(mode=20, count=2)
Your task¶
1- Ten patients at a doctor’s surgery wait for the following lengths of times to see their doctor. What are the mean, median, and mode for these data?
Dataset= [5 mins, 20 mins, 28 mins, 2 mins, 5 mins,9 mins, 62 mins, 11mins, 16 mins, 5 mins]
In [15]:
Dataset= [5 , 20 , 28 , 2 , 5 ,9 , 62 , 11, 16 , 5 ]
mean=np.mean(Dataset)
median=np.median(Dataset)
mode_=stats.mode(Dataset)
mean,median,mode[0]
Out[15]:
(16.3, 10.0, 5)
2- The "Datasaurus Dozen" dataset contains 13 datasets, each with a different shape but the same summary statistics. Import the datasaurus.csv
using pandas as df
.
Once you have imported the dataset, you can compute the mean for each dataset.
In [16]:
df=pd.read_csv('datasaurus.csv')
In [17]:
sns.relplot(data=df, x='x', y='y', col='dataset', col_wrap=4);
3- For this task, we will explore a real-world dataset. Load the dataset flights.csv
and store it in the variable flights
, which contains over 300,000 observations of flights departing NYC in 2013.
We will focus on displaying a single variable, the arrival delay of flights in minutes
The flight arrival delays are in minutes and negative values mean the flight was early.
What are the mean, median, and mode for these data? Store them in f_mean
, f_median
, and f_mode
.
In [41]:
flights=pd.read_csv("flights.csv")
flights=flights['arr_delay'][0]
In [46]:
import pandas as pd
import numpy as np
from scipy import stats
# Load the dataset
flights = pd.read_csv('flights.csv')
# Display the first few rows to understand the structure
print(flights.head())
# Extract the 'arr_delay' column
arrival_delays = flights['arr_delay']
# Remove any missing values
arrival_delays = arrival_delays.dropna()
# Calculate the mean
f_mean = np.mean(arrival_delays)
# Calculate the median
f_median = np.median(arrival_delays)
# Calculate the mode
f_mode = stats.mode(arrival_delays)[0][0]
f_mean, f_median, f_mode
Unnamed: 0 arr_delay name 0 0 11.0 United Air Lines Inc. 1 1 20.0 United Air Lines Inc. 2 2 33.0 American Airlines Inc. 3 3 -18.0 JetBlue Airways 4 4 -25.0 Delta Air Lines Inc.
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[46], line 24 21 f_median = np.median(arrival_delays) 23 # Calculate the mode ---> 24 f_mode = stats.mode(arrival_delays)[0][0] 26 f_mean, f_median, f_mode IndexError: invalid index to scalar variable.
In [43]:
f_mean=np.mean(flights)
f_median=np.median(flights)
f_mode=stats.mode(flights)
In [ ]:
In [ ]:
In [ ]:
In [ ]: