Statement of Completion#6b260cf8
Introduction to Inferential Statistics
medium
Hypothesis testing: one sample (Z-test)
Resolution
Activities
Project.ipynb
In [1]:
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
Understanding the Z-Test with Figures¶
- Historical Average Test Score $\mu_0$: 70 points.
- Standard Deviation $\sigma$: 10 points (based on historical data).
- Sample Average Test Score $\bar{x}$: 75 points.
- Sample Size $n$: 40 students.
Hypotheses¶
- Null Hypothesis $H_0$: The new study method does not increase the average test score $\mu \leq \mu_0$.
- Alternative Hypothesis $H_A$: The new study method increases the average test score $\mu > \mu_0$.
In [2]:
# Parameters
mu_0 = 70 # Historical average
sample_mean = 75 # Sample mean
sigma = 10 # Historical standard deviation
n = 40 # Number of students
# Calculate the Z-statistic
z = (sample_mean - mu_0) / (sigma / np.sqrt(n))
# Standard normal distribution
x = np.linspace(-4, 4, 1000)
y = stats.norm.pdf(x, 0, 1)
# Plotting
plt.figure(figsize=(10, 5))
plt.plot(x, y, label='Standard Normal Distribution')
# Highlight the critical region for alpha = 0.05 in a one-tailed test (right-tailed)
z_critical = stats.norm.ppf(0.95) # 95% quantile for right-tailed
plt.fill_between(x, y, where=x >= z_critical, color='red', alpha=0.5, label='Critical region (alpha = 0.05)')
# Add Z-value marker
plt.axvline(x=z, color='green', linestyle='--', label=f'Z-value = {z:.2f}')
plt.annotate(f'Z={z:.2f}', xy=(z, 0.02), xytext=(z-0.5, 0.15),
arrowprops=dict(facecolor='green', shrink=0.05))
plt.title('Z-Test Visualization for One-Tailed Test')
plt.xlabel('Z-value')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()
Consider a scenario where we're testing whether a new teaching method affects student performance. Assume the following:
- Population mean $\mu_0$ = 75 (average test score with traditional methods)
- Sample mean $\bar{X}$ = 80
- Standard deviation $\sigma$ = 10
- Sample size $n$ = 30
The Z-statistic would be calculated as follows: $$ Z = \frac{(80 - 75)}{(10/\sqrt{30})} \approx 2.74 $$
In [3]:
# Parameters
mu_0 = 75 # Hypothesized population mean
sample_mean = 80 # Sample mean
sigma = 10 # Standard deviation of the population
n = 30 # Sample size
# Calculate the Z-statistic
z = (sample_mean - mu_0) / (sigma / np.sqrt(n))
# Plotting the standard normal distribution
x = np.linspace(-4, 4, 1000)
y = stats.norm.pdf(x, 0, 1) # Standard normal distribution
plt.figure(figsize=(10, 5))
plt.plot(x, y, label='Standard Normal Distribution')
# Highlighting the critical region for a two-tailed test at alpha = 0.05
z_critical = stats.norm.ppf(0.975) # Two-tailed test, so 0.975 quantile
plt.fill_between(x, y, where=(x >= z_critical) | (x <= -z_critical), color='red', alpha=0.5, label='Critical region (alpha = 0.05)')
# Adding the Z-value marker
plt.axvline(x=z, color='green', linestyle='--', label=f'Z-value = {z:.2f}')
plt.annotate(f'Z={z:.2f}', xy=(z, 0.02), xytext=(z+0.5, 0.1),
arrowprops=dict(facecolor='green', shrink=0.05))
plt.title('Z-Test Visualization')
plt.xlabel('Z-value')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()
Exercises 2¶
Exercise 1: Testing New Drug Efficacy¶
Background: A pharmaceutical company has developed a new drug intended to lower blood pressure more effectively than the current standard medication. The known average decrease in blood pressure with the standard medication is 8 mmHg, with a standard deviation of 3 mmHg.
Task: Test if the new drug is more effective at lowering blood pressure compared to the standard medication.
- $H_0$: The new drug has an average decrease in blood pressure of 8 mmHg (the same as the standard medication).
- $H_A$: The new drug has an average decrease in blood pressure greater than 8 mmHg.
Assume you test the drug on 36 patients and find the average decrease in blood pressure is 10 mmHg.
Significance Level: Set $\alpha = 0.05$.
Python Task: Calculate the z-statistic and the p-value. Determine if the new drug is statistically more effective.
This exercises involves analyzing a different scenario using hypothesis testing with Python. Here's how you can implement the solution for any exercises in Python:
In [4]:
import scipy.stats as stats
# Sample parameters
mean_sample = 10
n = 36
std_dev = 3
mean_population = 8
# Calculating the z-statistic
z = (mean_sample - mean_population) / (std_dev / (n ** 0.5))
# Calculating the p-value for a one-sided test (Exercise 1)
#p = 1 - stats.norm.cdf(z) # Use this line for a one-sided test
p = (1 - stats.norm.cdf(z)) * 2 # Use this line for a two-sided test
print(f'Z-statistic: {z}')
print(f'P-value: {p}')
Z-statistic: 4.0 P-value: 6.334248366623996e-05
Exercise 2: Checking Production Quality¶
Background: A factory claims that their production machines produce screws that are 5 cm long on average, with a production variability (standard deviation) of 0.1 cm.
Task: Verify if the production process is accurate and the screws meet the claimed average length.
- $H_0$: The average length of screws produced is 5 cm.
- $H_A$: The average length of screws produced is not 5 cm.
You measure 100 randomly selected screws and find their average length to be 4.98 cm.
Significance Level: Set $\alpha = 0.01$.
Python Task: Calculate the z-statistic and the p-value to check if the production claim holds.
In [6]:
# Sample parameters
mean_sample = 4.98 # or 4.98 or 63.5 based on the exercise
n = 100
std_dev =0.1
mean_population = 5
# Calculating the z-statistic
z = (mean_sample - mean_population) / (std_dev / (n ** 0.5))
# Calculating the p-value for a one-sided test (Exercise 1)
#p = 1 - stats.norm.cdf(z) # Use this line for a one-sided test
p = (1 - stats.norm.cdf(z)) * 2 # Use this line for a two-sided test
print(f'Z-statistic: {z}')
print(f'P-value: {p}')
Z-statistic: -1.9999999999999574 P-value: 1.954499736103637
Exercise 3: Coffee Temperature Standards¶
Background: A coffee shop claims that the average temperature of the coffee they serve is 65°C. Assume that the standard deviation of coffee temperature is 3°C based on past data.
Task: Assess whether the coffee shop maintains this temperature standard.
- $H_0$: The average temperature of the served coffee is 65°C.
- $H_A$: The average temperature of the served coffee is not 65°C.
Suppose you measure the temperature of 64 cups of coffee and find an average temperature of 63.5°C.
Significance Level: Set $\alpha = 0.05$.
Activity 1¶
In [7]:
# Parameters
mean_sample = 65
n = 64
std_dev = 3
mean_population = 63.5
# Calculate the Z-statistic
z = (mean_sample - mean_population) / (std_dev / (n ** 0.5))
# Calculate the two-sided p-value
p = (1 - stats.norm.cdf(abs(z))) * 2
print(f'Z-statistic: {z:.2f}')
print(f'P-value: {p:.5f}')
Z-statistic: 4.00 P-value: 0.00006
Exercise 4: Product Weight Accuracy¶
Background: A chocolate factory claims that their chocolate bars weigh exactly 100 grams each. Based on past data, it is known that the standard deviation of the chocolate bar weights is 2 grams.
Task: Verify whether the factory's claim about the average weight of their chocolate bars holds true.
- $H_0$: The average weight of the chocolate bars is 100 grams.
- $H_A$: The average weight of the chocolate bars is not 100 grams.
Suppose a random sample of 49 chocolate bars is selected and the average weight measured is 98.5 grams.
Significance Level: Set $\alpha = 0.05$.
Activity 2¶
In [8]:
# Parameters
mean_sample = 100
n = 49
std_dev = 2
mean_population = 98.5
# Calculate the Z-statistic
z = (mean_sample - mean_population) / (std_dev / (n ** 0.5))
# Calculate the two-sided p-value
p = (1 - stats.norm.cdf(abs(z))) * 2
print(f'Z-statistic: {z:.2f}')
print(f'P-value: {p:.5f}')
Z-statistic: 5.25 P-value: 0.00000
In [ ]: