Statement of Completion#e1253c84
Classification in Depth with Scikit-Learn
easy
Logistic Regression
Resolution
Activities
The Logistic function¶
1- Let's code the logistic function
In [1]:
import matplotlib.pyplot as plt
import numpy as np
In [2]:
def logistic_function(x):
return 1 / (1 + np.exp(-x))
2- Plot the logistic function with sample data.
In [3]:
x = np.linspace(-10,10,100)
y = logistic_function(x)
plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Logistic Function')
plt.show()
Logistic Regression¶
3- Import the following functions from sklearn and statsmodels: Logistic Regression, train_test_split and sm
In [4]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
4- Load the data that we will be using for this lab: diabetes.csv
In [5]:
df = pd.read_csv("diabetes.csv")
5- Did you find any missing value?
In [6]:
df.isnull().sum()
Out[6]:
Pregnancies 0 Glucose 0 BloodPressure 0 SkinThickness 0 Insulin 0 BMI 0 DiabetesPedigreeFunction 0 Age 0 Outcome 0 dtype: int64
6- Split the data into train and test sets (30% test and 70% train) with a random_state=0.
In [7]:
X= df.drop("Outcome",axis =1)
y =df["Outcome"]
X_train,X_test,y_train,y_test=train_test_split(X,y,train_size=0.7,random_state=0)
In [8]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train_stand = scaler.transform(X_train)
X_test_stand = scaler.transform(X_test)
7- The features have different scales so let's Standardize the data
In [ ]:
8- Then create an instance of the logistic regression model and fit it to the training data.
In [9]:
logreg = LogisticRegression(solver = 'liblinear')
logreg.fit(X_train_stand,y_train)
Out[9]:
LogisticRegression(solver='liblinear')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LogisticRegression(solver='liblinear')
9- Use the model to make predictions on the test data and evaluate its accuracy score.
y_pred = logreg.predict(Complete)
print("Accuracy:", logreg.score(Complete, Complete)
In [10]:
y_pred = logreg.predict(X_test_stand)
print("Acur",logreg.score(X_test_stand,y_test))
Acur 0.7792207792207793
Perform model diagnostics.¶
10- You can use the summary() function in statsmodels to get detailed information about the model, such as the coefficients, p-values, and standard errors. This can give you an idea of the strength and significance of the relationship between each independent variable and the dependent variable.
In [11]:
logit_model=sm.Logit(y_train,X_train_stand)
result=logit_model.fit()
print(result.summary2())
Optimization terminated successfully. Current function value: 0.528464 Iterations 6 Results: Logit ================================================================= Model: Logit Pseudo R-squared: 0.192 Dependent Variable: Outcome AIC: 583.5699 Date: 2025-01-25 15:00 BIC: 617.8579 No. Observations: 537 Log-Likelihood: -283.78 Df Model: 7 LL-Null: -351.27 Df Residuals: 529 LLR p-value: 5.7166e-26 Converged: 1.0000 Scale: 1.0000 No. Iterations: 6.0000 -------------------------------------------------------------------- Coef. Std.Err. z P>|z| [0.025 0.975] -------------------------------------------------------------------- x1 0.2682 0.1249 2.1466 0.0318 0.0233 0.5130 x2 1.0771 0.1440 7.4802 0.0000 0.7948 1.3593 x3 -0.2326 0.1197 -1.9426 0.0521 -0.4673 0.0021 x4 0.1166 0.1322 0.8820 0.3778 -0.1425 0.3757 x5 -0.1860 0.1272 -1.4620 0.1437 -0.4353 0.0633 x6 0.5976 0.1326 4.5075 0.0000 0.3378 0.8575 x7 0.2719 0.1161 2.3410 0.0192 0.0443 0.4995 x8 0.2450 0.1315 1.8633 0.0624 -0.0127 0.5027 =================================================================
/usr/local/lib/python3.11/site-packages/statsmodels/iolib/summary2.py:579: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead. dat = dat.applymap(lambda x: _formatter(x, float_format))
logreg.predict_proba(X_train_stand)
11- Many problems require a probability estimate as a result. Estimate the probabilities as shows:
In [12]:
logreg.predict_proba(X_train_stand)
Out[12]:
array([[0.41121528, 0.58878472], [0.97365877, 0.02634123], [0.66085209, 0.33914791], ..., [0.94046578, 0.05953422], [0.85330041, 0.14669959], [0.9013497 , 0.0986503 ]])
- Determine the Coefficients of the Logistic Regression.
In [13]:
# Access the coefficients
coef = logreg.coef_
Decision boundaries¶
13- Let's plot the decision boundaries of the model.
In [14]:
from mlxtend.plotting import plot_decision_regions
from sklearn import datasets
# Loading some example data
#iris = datasets.load_iris()
X =X_train_stand[:,:2]
y=y_train.to_numpy()
logreg = LogisticRegression(solver='liblinear')
logreg.fit(X, y)
# Plotting decision regions
plot_decision_regions(X ,y, clf=logreg, legend=2)
# Adding axes annotations
plt.xlabel('Pregnancies')
plt.ylabel('Glucose')
plt.title('LogisticRegression')
plt.show()
Activities¶
14- Fit a logistic regression model using the given input and target values. Once fitted, obtain the first coeficient of this model.
15- Fit a logistic regression model using the given input and output patterns X and Y, respectively. Once fitted, determine the accuracy score of the test dataset
c1.fit(X_test,y_test)
In [59]:
from sklearn.datasets import make_blobs
X, y = make_blobs(n_samples=1000, centers=2,
random_state=49, cluster_std=1.95)
In [60]:
X_test= [[0.77499332, 5.10445441],
[2.33615249, 3.733497 ],
[3.53929109, 1.13492994],
[2.82894249, 4.00864077],
[4.61800816, 2.39809546]]
y_test=[0, 1, 1, 0, 0]
In [61]:
X, y = make_blobs(n_samples=1000, centers=2,
random_state=49, cluster_std=1.95)
clf2 = LogisticRegression(random_state=0)
clf2.fit(X, y)
print("Accuracy:", clf2.score(X_test, y_test))
Accuracy: 0.4
In [ ]: