Statement of Completion#bd184e91
Visualizations with Matplotlib
easy
Understanding Relationship plots
Resolution
Activities
Relation plot¶
To start, we need to set up our Python environment with the necessary libraries. We'll be using Matplotlib for visualization and NumPy for data manipulation and simulation.
Here's how you can import these libraries:
In [25]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
Different Types of Relationship Plots¶
Let's create simulated data suitable for line plots, we can generate a simple dataset where one variable changes over time or in relation to another variable.
In [2]:
# Generate time points (e.g., months)
time = np.arange(0, 60, 1) # 60 months (5 years)
# Generate revenue data with a trend and some randomness
np.random.seed(0) # Seed for reproducibility
trend = time * 1.5 # Trend: Revenue increases over time
variability = np.random.normal(0, 10, size=time.size) # Random variability
# Total revenue is a combination of trend and variability
revenue = trend + variability
# Plot the simulated data
plt.plot(time, revenue)
plt.title("Simulated Company Revenue Over Time")
plt.xlabel("Time (Months)")
plt.ylabel("Revenue (in thousands)")
plt.show()
Explore how to plot a mathematical function using Matplotlib, specifically the sine function over an interval. Now let's create a plot of the sine function, sin(x)
, with x in the [0, 2 * pi]
interval.
In [3]:
# Create points for the curve
X = np.linspace(0, 2 * np.pi, 100) # Evenly spaced points from 0 to 2*pi
Y = np.sin(X) # Sine of each point in X
# Plot the sine curve
plt.plot(X, Y)
plt.title('Sine Curve')
plt.xlabel('X')
plt.ylabel('sin(X)')
plt.grid(True)
plt.show()
Plotting multiple curves.¶
In [4]:
# Create points for the curves
X = np.linspace(0, 2 * np.pi, 100)
Y_sine = np.sin(X)
Y_cosine = np.cos(X)
# Plot the sine and cosine curves
plt.plot(X, Y_sine, label='sin(X)')
plt.plot(X, Y_cosine, label='cos(X)', linestyle='--')
plt.title('Sine vs. Cosine')
plt.xlabel('X')
plt.ylabel('Value')
plt.grid(True)
plt.legend()
plt.show()
Scatter-plot.¶
Learn how to create scatter plots and line plots to illustrate the relationships between the variables in your simulated data.
In [5]:
x = np.linspace(0, 10, 100)
y = 2 * x + np.random.normal(size=100)
plt.scatter(x, y)
plt.title("Simulated Data Relationship")
plt.xlabel("X Variable")
plt.ylabel("Y Variable")
plt.show()
Explore how to use scatter plots in Matplotlib to compare two different datasets
In [6]:
girls = [89, 90, 70, 89, 100, 80, 90, 100, 80, 34]
boys = [30, 29, 49, 48, 100, 48, 38, 45, 20, 30]
prom = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
fig = plt.figure(figsize=(5,2)); ax = plt.axes()
ax.scatter(prom, girls, color='r')
ax.scatter(prom, boys, color='b')
ax.set_xlabel('Mean'); ax.set_ylabel('Total'); ax.grid()
ax.set_title('Visualizing Data by Gender with Scatter Plots'); ax.legend(labels=['Girls', 'Boys'])
plt.show()
Activities¶
Activity 1: Understanding Scatter Plots¶
In [ ]:
# Select the correct answer
Activity 2: Understanding Line Plots¶
In [ ]:
# Select the correct answer
Activity 3: Visualizing Yearly Passenger Numbers¶
In [7]:
flights = pd.read_csv("flights.csv")
flights.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 144 entries, 0 to 143 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 144 non-null int64 1 year 144 non-null int64 2 month 144 non-null object 3 passengers 144 non-null int64 dtypes: int64(3), object(1) memory usage: 4.6+ KB
In [8]:
flights.head()
Out[8]:
Unnamed: 0 | year | month | passengers | |
---|---|---|---|---|
0 | 0 | 1949 | Jan | 112 |
1 | 1 | 1949 | Feb | 118 |
2 | 2 | 1949 | Mar | 132 |
3 | 3 | 1949 | Apr | 129 |
4 | 4 | 1949 | May | 121 |
In [11]:
flights.groupby('year')['passengers'].sum().reset_index()
Out[11]:
year | passengers | |
---|---|---|
0 | 1949 | 1520 |
1 | 1950 | 1676 |
2 | 1951 | 2042 |
3 | 1952 | 2364 |
4 | 1953 | 2700 |
5 | 1954 | 2867 |
6 | 1955 | 3408 |
7 | 1956 | 3939 |
8 | 1957 | 4421 |
9 | 1958 | 4572 |
10 | 1959 | 5140 |
11 | 1960 | 5714 |
In [28]:
# Load the dataset
flights = pd.read_csv("flights.csv")
# Aggregate data by year
yearly_passengers = flights.groupby('year')['passengers'].sum().reset_index()
fig, ax = plt.subplots()
# Create the line plot
ax.plot(yearly_passengers['year'], yearly_passengers['passengers'])
ax.set_title('Yearly Passenger Numbers')
ax.set_xlabel('Year')
ax.set_ylabel('Number of Passengers')
plt.show()