Measures of Central Tendency

Finished

July 23, 2024 9:01 PM

Elapsed time (min)

Completed activities

Resolution

Activities

Measures of central tendency¶

First, import the Importing required libraries

In [31]:

import numpy as np
import pandas as pd
from collections import Counter
from scipy.stats import norm
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Let’s show these by calculating the mean, median, and mode. Suppose we weight of luggage presented by airline passengers at the check-in (measured to the nearest kg):

In [2]:

# Define a list of numbers
data = [10, 12, 15, 17, 20, 20, 22, 25, 30, 35]

# Calculate the mean
mean = sum(data) / len(data)
print("Mean:", mean)

# Calculate the median
sorted_data = sorted(data)
n = len(sorted_data)
if n % 2 == 0:
    median = (sorted_data[n // 2 - 1] + sorted_data[n // 2]) / 2
else:
    median = sorted_data[n // 2]
print("Median:", median)

# Calculate the mode
c = Counter(data)
mode = c.most_common(1)[0][0]
print("Mode:", mode)

Mean: 20.6
Median: 20.0
Mode: 20

We could use numpy and scipy to compute these measures

In [3]:

import numpy as np
from scipy import stats

mean_=np.mean(data)
median_=np.median(data)
mode_=stats.mode(data)

print('mean',mean_,'median',median_,'mode',mode_)

mean 20.6 median 20.0 mode ModeResult(mode=20, count=2)

Your task¶

1- Ten patients at a doctor’s surgery wait for the following lengths of times to see their doctor. What are the mean, median, and mode for these data?

Dataset= [5 mins, 20 mins, 28 mins, 2 mins, 5 mins,9 mins, 62 mins, 11mins, 16 mins, 5 mins]

In [15]:

Dataset= [5 , 20 , 28 , 2 , 5 ,9 , 62 , 11, 16 , 5 ]
mean=np.mean(Dataset)
median=np.median(Dataset)
mode_=stats.mode(Dataset)
mean,median,mode[0]

Out[15]:

(16.3, 10.0, 5)

2- The "Datasaurus Dozen" dataset contains 13 datasets, each with a different shape but the same summary statistics. Import the datasaurus.csv using pandas as df.

Once you have imported the dataset, you can compute the mean for each dataset.

In [16]:

df=pd.read_csv('datasaurus.csv')

In [17]:

sns.relplot(data=df, x='x', y='y', col='dataset', col_wrap=4);

No description has been provided for this image

3- For this task, we will explore a real-world dataset. Load the dataset flights.csv and store it in the variable flights, which contains over 300,000 observations of flights departing NYC in 2013.

We will focus on displaying a single variable, the arrival delay of flights in minutes

The flight arrival delays are in minutes and negative values mean the flight was early.

What are the mean, median, and mode for these data? Store them in f_mean, f_median, and f_mode.

In [41]:

flights=pd.read_csv("flights.csv")
flights=flights['arr_delay'][0]

In [46]:

import pandas as pd
import numpy as np
from scipy import stats

# Load the dataset
flights = pd.read_csv('flights.csv')

# Display the first few rows to understand the structure
print(flights.head())

# Extract the 'arr_delay' column
arrival_delays = flights['arr_delay']

# Remove any missing values
arrival_delays = arrival_delays.dropna()

# Calculate the mean
f_mean = np.mean(arrival_delays)

# Calculate the median
f_median = np.median(arrival_delays)

# Calculate the mode
f_mode = stats.mode(arrival_delays)[0][0]

f_mean, f_median, f_mode

   Unnamed: 0  arr_delay                    name
0           0       11.0   United Air Lines Inc.
1           1       20.0   United Air Lines Inc.
2           2       33.0  American Airlines Inc.
3           3      -18.0         JetBlue Airways
4           4      -25.0    Delta Air Lines Inc.

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[46], line 24
     21 f_median = np.median(arrival_delays)
     23 # Calculate the mode
---> 24 f_mode = stats.mode(arrival_delays)[0][0]
     26 f_mean, f_median, f_mode

IndexError: invalid index to scalar variable.

In [43]:

f_mean=np.mean(flights)
f_median=np.median(flights)
f_mode=stats.mode(flights)

In [ ]:

Statement of Completion#6684b5cb

Introduction to Descriptive Statistics

Measures of Central Tendency

Measures of central tendency¶

Your task¶