Statement of Completion#c36019a2
Visualizations with Matplotlib
easy
Understanding Relationship plots
Resolution
Activities
Relation plot¶
To start, we need to set up our Python environment with the necessary libraries. We'll be using Matplotlib for visualization and NumPy for data manipulation and simulation.
Here's how you can import these libraries:
In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
Different Types of Relationship Plots¶
Activity 1: Let's create simulated data suitable for line plots, we can generate a simple dataset where one variable changes over time or in relation to another variable.¶
In [2]:
# Generate time points (e.g., months)
time = np.arange(0, 60, 1) # 60 months (5 years)
# Generate revenue data with a trend and some randomness
np.random.seed(0) # Seed for reproducibility
trend = time * 1.5 # Trend: Revenue increases over time
variability = np.random.normal(0, 10, size=time.size) # Random variability
# Total revenue is a combination of trend and variability
revenue = trend + variability
# Plot the simulated data
plt.plot(time, revenue)
plt.title("Simulated Company Revenue Over Time")
plt.xlabel("Time (Months)")
plt.ylabel("Revenue (in thousands)")
plt.show()
Activity 2: Let's explore how to plot a mathematical function using Matplotlib, specifically the sine function over an interval. Now let's create a plot of the sine function, sin(x), with x in the [0, 2 * pi] interval.¶
In [3]:
# Create points for the curve
X = np.linspace(0, 2 * np.pi, 100) # Evenly spaced points from 0 to 2*pi
Y = np.sin(X) # Sine of each point in X
# Plot the sine curve
plt.plot(X, Y)
plt.title('Sine Curve')
plt.xlabel('X')
plt.ylabel('sin(X)')
plt.grid(True)
plt.show()
Activity 3: Plotting multiple curves.¶
Let's create an example where we compare two curves to understand their similarities, differences, and correlations. This example will involve plotting and comparing the sine and cosine functions over the same interval. The goal is to visually assess where these curves match, where they diverge, and how they might be correlated.
In [4]:
# Create points for the curves
X = np.linspace(0, 2 * np.pi, 100)
Y_sine = np.sin(X)
Y_cosine = np.cos(X)
# Plot the sine and cosine curves
plt.plot(X, Y_sine, label='sin(X)')
plt.plot(X, Y_cosine, label='cos(X)', linestyle='--')
plt.title('Sine vs. Cosine')
plt.xlabel('X')
plt.ylabel('Value')
plt.grid(True)
plt.legend()
plt.show()
Activity 5: Scatter-plot.¶
Let's learn how to create scatter plots and line plots to illustrate the relationships between the variables in your simulated data.
In [5]:
x = np.linspace(0, 10, 100)
y = 2 * x + np.random.normal(size=100)
plt.scatter(x, y)
plt.title("Simulated Data Relationship")
plt.xlabel("X Variable")
plt.ylabel("Y Variable")
plt.show()
Activity 6: Scatter-plot.¶
Let's explore how to use scatter plots in Matplotlib to compare two different datasets
In [6]:
girls = [89, 90, 70, 89, 100, 80, 90, 100, 80, 34]
boys = [30, 29, 49, 48, 100, 48, 38, 45, 20, 30]
prom = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
fig = plt.figure(figsize=(5,2)); ax = plt.axes()
ax.scatter(prom, girls, color='r')
ax.scatter(prom, boys, color='b')
ax.set_xlabel('Mean'); ax.set_ylabel('Total'); ax.grid()
ax.set_title('Visualizing Data by Gender with Scatter Plots'); ax.legend(labels=['Girls', 'Boys'])
plt.show()
Activity 7: Visualizing Yearly Passenger Numbers¶
In this exercise, you will visualize the total number of passengers traveling each year using the "flights" dataset from Seaborn. Your task involves aggregating the monthly passenger data into yearly totals and creating a line plot to visualize these totals over time.
- Load the Dataset: Load the "flights.csv" dataset into a variable named
flights
. - Aggregate Data: Group the data by year and calculate the total number of passengers for each year. Store the aggregated data in a variables name
yearly_passengers
. - Create the Plot: Use Matplotlib to plot the yearly passenger numbers. Your plot should have years on the x-axis and the total number of passengers on the y-axis.
In [7]:
flights = pd.read_csv("flights.csv")
flights.head()
Out[7]:
Unnamed: 0 | year | month | passengers | |
---|---|---|---|---|
0 | 0 | 1949 | Jan | 112 |
1 | 1 | 1949 | Feb | 118 |
2 | 2 | 1949 | Mar | 132 |
3 | 3 | 1949 | Apr | 129 |
4 | 4 | 1949 | May | 121 |
In [11]:
yearly_passengers = flights.groupby("year").agg({"passengers": "sum"})
yearly_passengers.reset_index(inplace=True)
yearly_passengers
Out[11]:
year | passengers | |
---|---|---|
0 | 1949 | 1520 |
1 | 1950 | 1676 |
2 | 1951 | 2042 |
3 | 1952 | 2364 |
4 | 1953 | 2700 |
5 | 1954 | 2867 |
6 | 1955 | 3408 |
7 | 1956 | 3939 |
8 | 1957 | 4421 |
9 | 1958 | 4572 |
10 | 1959 | 5140 |
11 | 1960 | 5714 |
In [12]:
# Create a Figure and Axes
fig, ax = plt.subplots()
# Plot the data on the Axes
ax.plot(yearly_passengers['year'], yearly_passengers['passengers'])
# Add title and labels
ax.set_title('Yearly Passenger Numbers')
ax.set_xlabel('Year')
ax.set_ylabel('Number of Passengers')
# Display the plot
plt.show()
In [ ]: