Statement of Completion#0dcfcb55
Numeric programming with Numpy
medium
Visualize Home Loan Financial Data using Pandas, Numpy and Matplotlib
Resolution
Activities
Project.ipynb
Visualisation with NumPy¶
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
In [120]:
df = pd.read_csv('home_loan_data.csv')
df.head(4)
Out[120]:
ID | Gender | Married | Dependents | Education | Self_Employed | ApplicantIncome | CoapplicantIncome | LoanAmount | Loan_Amount_Term | Credit_History | Property_Area | Loan_Status | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | LP001367 | Male | Yes | 1 | Graduate | No | 3052 | 1030.0 | 100.0 | 360.0 | 1.0 | Urban | Y |
1 | LP002517 | Male | Yes | 1 | Not Graduate | No | 2653 | 1500.0 | 113.0 | 180.0 | 0.0 | Rural | N |
2 | LP001398 | Male | No | 0 | Graduate | NaN | 5050 | 0.0 | 118.0 | 360.0 | 1.0 | Semiurban | Y |
3 | LP001318 | Male | Yes | 2 | Graduate | No | 6250 | 5654.0 | 188.0 | 180.0 | 1.0 | Semiurban | Y |
Warm-up Activity¶
Activity 1: What function is used to generate an evenly spaced sequence of numbers for plotting a 1D array?¶
In [ ]:
# Write your code here
Activity 2. What does fig, ax = plt.subplots() do in Matplotlib?¶
In [ ]:
Activity 3. What does np.arange(2, 10, 2)
return?¶
In [ ]:
Activity 4. What are the two values returned by np.histogram()
?¶
In [ ]:
Activity 5. Plot basic line graphs.¶
In [3]:
fig, ax = plt.subplots()
x = np.arange(10)
y = x**2
ax.plot(x, y)
ax.set_title('Line Chart of x^2')
ax.set_xlabel('X values')
ax.set_ylabel('Y values')
plt.show()
Practice Activities¶
Load the dataset home_loan_data.csv
to perform the following activities.
Activity 6. Line Chart of Cumulative Applicant Income¶
In [5]:
data = np.cumsum(df.loc[:49, 'ApplicantIncome'])
fig, ax = plt.subplots(figsize = (15, 8))
ax.plot(df.loc[:49,'ID'], data, color = 'blue', marker = 'o', linestyle = '-')
ax.set_xlabel('Loan ID')
ax.set_ylabel('Cumulative Applicant Income')
ax.set_title('Cumulative Applicant Income Trend (First 50 Loans)')
ax.tick_params("x", rotation=45)
plt.show()
Activity 7. Scatter Plot - Loan Amount vs. Applicant Income with Log Scaling¶
In [7]:
loan_amount = df['LoanAmount'].to_numpy()
y = np.log(np.nan_to_num(loan_amount, nan = np.nanmean(loan_amount)))
x = np.log(df['ApplicantIncome'])
fig, ax = plt.subplots()
ax.scatter(x, y, color = 'red', alpha = 0.5)
ax.set_xlabel('Log of Applicant Income')
ax.set_ylabel('Log of Loan Amount')
ax.set_title('Log-Scaled Loan Amount vs. Applicant Income')
plt.show()
Activity 8. Bar Chart - Loan Approvals by Property Area¶
In [9]:
df['Loan_Status'].value_counts()
data = df.loc[df['Loan_Status'] == 'Y', 'Property_Area'].value_counts()
data
fig, ax = plt.subplots()
ax.bar(data.index, data, color = ('blue', 'red', 'green'))
ax.set_xlabel('Property Area')
ax.set_ylabel('Number of Approved Loans')
ax.set_title('Loan Approvals by Property Area')
plt.show()
Activity 9. Histogram - Distribution of Loan Amounts¶
In [11]:
#loan_amount = df['LoanAmount'].fillna(df['LoanAmount'].median())
loan_amount = np.nan_to_num(df['LoanAmount'], nan = np.nanmedian(df['LoanAmount']))
hist_values, bin_edges = np.histogram(loan_amount, bins = 10)
fig, ax = plt.subplots()
ax.hist(loan_amount, bins = bin_edges, edgecolor = 'black', alpha = 0.7)
ax.set_title('Distribution of Loan Amounts')
ax.set_xlabel('Loan Amount')
ax.set_ylabel('Frequency')
plt.show()
Activity 10. Box Plot - Applicant Income Distribution¶
In [13]:
df.columns
Out[13]:
Index(['ID', 'Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'ApplicantIncome', 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term', 'Credit_History', 'Property_Area', 'Loan_Status'], dtype='object')
In [21]:
applicant_income = df['ApplicantIncome'].dropna()
q1, median, q3 = np.quantile(applicant_income, [0.25, 0.5, 0.75])
print(q1, median, q3)
2920.0 3814.0 5703.0
In [20]:
fig, ax = plt.subplots()
ax.boxplot(applicant_income)
ax.set_title('Applicant Income Distribution')
ax.set_ylabel('Income')
plt.show()
Activity 11. Pie Chart - Loan Status Distribution¶
In [28]:
loan_status = df['Loan_Status'].value_counts()
In [29]:
fig, ax = plt.subplots()
ax.pie(loan_status.values, labels = loan_status.index, autopct = '%1.1f%%', colors = ['green', 'red'])
ax.set_title('Loan Status Distribution')
plt.show()
Out[29]:
Text(0.5, 1.0, 'Loan Status Distribution')
Activity 12. Bar Chart - Education Level vs. Average Loan Amount¶
In [62]:
df['Education'].value_counts()
Out[62]:
Education Graduate 331 Not Graduate 98 Name: count, dtype: int64
In [63]:
loan_amount_education = df.groupby('Education')['LoanAmount'].mean()
loan_amount_education
Out[63]:
Education Graduate 147.818731 Not Graduate 120.387755 Name: LoanAmount, dtype: float64
In [66]:
education_levels = np.array(loan_amount_education.index)
average_loan = np.array(loan_amount_education.values)
fig, ax = plt.subplots()
ax.bar(education_levels, average_loan, color = ['blue', 'red'])
ax.set_xlabel('Education Level')
ax.set_ylabel('Average Loan Amount')
ax.set_title('Average Loan Amount by Education Level')
plt.show()
In [71]:
# Compute average loan amount per education category
education_groups = df.groupby("Education")["LoanAmount"].mean()
# Convert to NumPy arrays
education_levels = np.array(education_groups.index)
average_loan = np.array(education_groups.values)
# Plot Bar Chart
fig, ax = plt.subplots()
ax.bar(education_levels, average_loan, color=["blue", "red"])
ax.set_title("Average Loan Amount by Education Level")
ax.set_xlabel("Education Level")
ax.set_ylabel("Average Loan Amount")
plt.show()
Activity 13. Line Chart - Loan Amount vs. Loan Term¶
In [76]:
loan_amount_by_term = df.groupby('Loan_Amount_Term')['LoanAmount'].mean()
In [77]:
terms = np.array(loan_amount_by_term.index)
amount_mean = np.array(loan_amount_by_term.values)
fig, ax = plt.subplots()
ax.plot(terms, amount_mean, color = 'brown', marker = 'o', linestyle = '-')
ax.set_xlabel('Loan Term (Months)')
ax.set_ylabel('Average Loan Amount')
ax.set_title('Average Loan Amount by Loan Term')
plt.show()
Activity 14. In activity 13, why is np.array()
used while extracting loan terms and average loan amounts?¶
In [ ]:
Activity 15. Histogram - Distribution of Coapplicant Income¶
In [79]:
coapplicant_income = np.nan_to_num(df['CoapplicantIncome'], nan = 0)
hist_values, bin_edges = np.histogram(coapplicant_income, bins = 10)
fig, ax = plt.subplots()
ax.hist(coapplicant_income, bins = bin_edges, edgecolor = 'black', alpha = 0.7, color = 'orange')
ax.set_title('Distribution of Coapplicant Income')
ax.set_xlabel('Coapplicant Income')
ax.set_ylabel('Frequency')
plt.show()
Activity 16. Box Plot - Coapplicant Income Distribution by Marital Status¶
In [81]:
df.columns
Out[81]:
Index(['ID', 'Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'ApplicantIncome', 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term', 'Credit_History', 'Property_Area', 'Loan_Status'], dtype='object')
In [93]:
df['CoapplicantIncome'].dropna(inplace = True)
co_income_married_yes = df.loc[df['Married'] == 'Yes', 'CoapplicantIncome']
co_income_married_no = df.loc[df['Married'] == 'No', 'CoapplicantIncome']
In [94]:
fig, ax = plt.subplots()
ax.boxplot([co_income_married_yes, co_income_married_no],
tick_labels = ['Married', 'Not Married'])
ax.set_title('Coapplicant Income Distribution by Marital Status')
ax.set_ylabel('Coapplicant Income')
plt.show()
Activity 17. In Activity 16, what does the median line inside the box plot represent?¶
In [ ]:
Activity 18. If the box plot in Activity 16 shows dots outside the whiskers, what does it indicate?¶
In [ ]:
Activity 19. Subplots - Loan Amount vs. Applicant Income Across Loan Status¶
In [98]:
loan_yes = df.loc[df['Loan_Status'] == 'Y', 'LoanAmount']
loan_no = df.loc[df['Loan_Status'] == 'N', 'LoanAmount']
income_yes = df.loc[df['Loan_Status'] == 'Y', 'ApplicantIncome']
income_no = df.loc[df['Loan_Status'] == 'N', 'ApplicantIncome']
In [103]:
fig, ax = plt.subplots(nrows=1, ncols=2, figsize = (12,5), sharey = ax[1])
ax[0].scatter(income_yes, loan_yes, color = 'green', alpha = 0.5)
ax[0].set_title('Approved Loans')
ax[0].set_xlabel('Applicant Income')
ax[0].set_ylabel('Loan Amount')
ax[1].scatter(income_no, loan_no, color = 'red', alpha = 0.5)
ax[1].set_title('Rejected Loans')
ax[1].set_xlabel('Applicant Income')
plt.show()
In [ ]:
Activity 20. Subplots - Loan Amount vs. Income with Different Property Areas¶
In [105]:
df['Property_Area'].value_counts()
Out[105]:
Property_Area Semiurban 155 Urban 144 Rural 130 Name: count, dtype: int64
In [107]:
loan_semiurban = df.loc[df['Property_Area'] == 'Semiurban']
loan_urban = df.loc[df['Property_Area'] == 'Urban']
loan_rural = df.loc[df['Property_Area'] == 'Rural']
In [115]:
fig, ax = plt.subplots(nrows=1, ncols=3, figsize = (15,5), sharey = True)
ax[0].scatter(loan_urban['ApplicantIncome'], loan_urban['LoanAmount'], color = 'blue', alpha = 0.5)
ax[0].set_title('Urban Area')
ax[0].set_xlabel('Applicant Income')
ax[0].set_ylabel('Loan Amount')
ax[1].scatter(loan_rural['ApplicantIncome'], loan_rural['LoanAmount'], color = 'red', alpha = 0.5)
ax[1].set_title('Rural Area')
ax[1].set_xlabel('Applicant Income')
ax[2].scatter(loan_semiurban['ApplicantIncome'], loan_semiurban['LoanAmount'], color = 'green', alpha = 0.5)
ax[2].set_title('Semiurban Area')
ax[2].set_xlabel('Applicant Income')
plt.show()
Activity 21. Stacked Bar Chart - Property Area vs. Loan Status¶
In [121]:
df.dropna(subset=['Loan_Status', 'Property_Area'], inplace = True)
In [142]:
loan_yes = df.loc[df['Loan_Status'] == 'Y']
loan_no = df.loc[df['Loan_Status'] == 'N']
In [158]:
counts = {'Approved': [loan_yes[loan_yes['Property_Area'] == 'Rural'].shape[0],
loan_yes[loan_yes['Property_Area'] == 'Semiurban'].shape[0],
loan_yes[loan_yes['Property_Area'] == 'Urban'].shape[0]
],
'Rejected': [loan_no[loan_no['Property_Area'] == 'Rural'].shape[0],
loan_no[loan_no['Property_Area'] == 'Semiurban'].shape[0],
loan_no[loan_no['Property_Area'] == 'Urban'].shape[0]
]
}
In [161]:
fig, ax = plt.subplots()
labels = ['Rural', 'Semiurban', 'Urban']
color = ['green', 'red']
bottom = np.zeros(3)
i=0
for label, count in counts.items():
p = ax.bar(labels, count, label=label, bottom=bottom, color = color[i])
bottom += count
i+= 1
ax.set_xlabel('Property Area')
ax.set_ylabel('Number of Loans')
ax.set_title('Loan Approval by Property Area')
ax.legend()
plt.show()
Activity 22. Line Chart - Loan Amount vs. Credit History¶
In [166]:
loan_credit = df.groupby('Credit_History')['LoanAmount'].mean()
loan_credit
Out[166]:
Credit_History 0.0 137.224138 1.0 141.366460 Name: LoanAmount, dtype: float64
In [173]:
fig, ax = plt.subplots()
ax.plot(loan_credit.values, color = 'blue', marker = 'o', linestyle = '-')
ax.set_xlabel('Credit History (0 = No, 1 = Yes)')
ax.set_ylabel('Average Loan Amount')
ax.set_title('Loan Amount vs Credit History')
ax.set_xticks([0, 1])
plt.show()
In [ ]:
Solution-01.ipynb
In [ ]:
exec(open("utils.py").read())
Visualisation with NumPy and Matplotlib¶
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
--------------------------------------------------------------------------- ImportError Traceback (most recent call last) Cell In [1], line 3 1 import pandas as pd 2 import numpy as np ----> 3 import matplotlib.pyplot as plt File c:\Users\Abhishek\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\pyplot.py:60 58 from matplotlib import _docstring 59 from matplotlib.backend_bases import FigureCanvasBase, MouseButton ---> 60 from matplotlib.figure import Figure, FigureBase, figaspect 61 from matplotlib.gridspec import GridSpec, SubplotSpec 62 from matplotlib import rcParams, rcParamsDefault, get_backend, rcParamsOrig File c:\Users\Abhishek\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\figure.py:40 37 import numpy as np 39 import matplotlib as mpl ---> 40 from matplotlib import _blocking_input, backend_bases, _docstring, projections 41 from matplotlib.artist import ( 42 Artist, allow_rasterization, _finalize_rasterization) 43 from matplotlib.backend_bases import ( 44 DrawEvent, FigureCanvasBase, NonGuiException, MouseButton, _get_renderer) File c:\Users\Abhishek\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\projections\__init__.py:55 1 """ 2 Non-separable transforms that map from data space to screen space. 3 (...) 52 `matplotlib.projections.polar` may also be of interest. 53 """ ---> 55 from .. import axes, _docstring 56 from .geo import AitoffAxes, HammerAxes, LambertAxes, MollweideAxes 57 from .polar import PolarAxes ImportError: cannot import name 'axes' from 'matplotlib' (c:\Users\Abhishek\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\__init__.py)
In [2]:
# Load Dataset
df = pd.read_csv("home_loan_data.csv")
df.head(2)
Out[2]:
ID | Gender | Married | Dependents | Education | Self_Employed | ApplicantIncome | CoapplicantIncome | LoanAmount | Loan_Amount_Term | Credit_History | Property_Area | Loan_Status | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | LP001367 | Male | Yes | 1 | Graduate | No | 3052 | 1030.0 | 100.0 | 360.0 | 1.0 | Urban | Y |
1 | LP002517 | Male | Yes | 1 | Not Graduate | No | 2653 | 1500.0 | 113.0 | 180.0 | 0.0 | Rural | N |
Warmup Activity¶
Activity 1: Title of the activity¶
In [ ]:
# Write your code here
Solution:
In [ ]:
placeholder = 3 # your solution, it'll be hidden later
Assertions:
In [ ]: