Muru Gesh has successfully completed this project.

Classification in Depth with Scikit-Learn

easy

4.59

Logistic Regression

Finished

January 25, 2025 3:16 PM

Elapsed time (min)

Completed activities

Resolution

Activities

The Logistic function¶

1- Let's code the logistic function

In [1]:

import matplotlib.pyplot as plt
import numpy as np

In [2]:

def logistic_function(x):
    return 1 / (1 + np.exp(-x))

2- Plot the logistic function with sample data.

In [3]:

x = np.linspace(-10,10,100)
y = logistic_function(x)

plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Logistic Function')
plt.show()

No description has been provided for this image

Logistic Regression¶

3- Import the following functions from sklearn and statsmodels: Logistic Regression, train_test_split and sm

In [4]:

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import statsmodels.api as sm

4- Load the data that we will be using for this lab: diabetes.csv

In [5]:

df = pd.read_csv("diabetes.csv")

5- Did you find any missing value?

In [6]:

df.isnull().sum()

Out[6]:

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64

6- Split the data into train and test sets (30% test and 70% train) with a random_state=0.

In [7]:

X= df.drop("Outcome",axis =1)
y =df["Outcome"]
X_train,X_test,y_train,y_test=train_test_split(X,y,train_size=0.7,random_state=0)

In [8]:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train_stand = scaler.transform(X_train)
X_test_stand = scaler.transform(X_test)

7- The features have different scales so let's Standardize the data

In [ ]:

8- Then create an instance of the logistic regression model and fit it to the training data.

In [9]:

logreg = LogisticRegression(solver = 'liblinear')
logreg.fit(X_train_stand,y_train)

Out[9]:

LogisticRegression(solver='liblinear')

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

9- Use the model to make predictions on the test data and evaluate its accuracy score.

    y_pred = logreg.predict(Complete)
    print("Accuracy:", logreg.score(Complete, Complete)

In [10]:

y_pred = logreg.predict(X_test_stand)
print("Acur",logreg.score(X_test_stand,y_test))

Acur 0.7792207792207793

Perform model diagnostics.¶

10- You can use the summary() function in statsmodels to get detailed information about the model, such as the coefficients, p-values, and standard errors. This can give you an idea of the strength and significance of the relationship between each independent variable and the dependent variable.

In [11]:

logit_model=sm.Logit(y_train,X_train_stand)
result=logit_model.fit()
print(result.summary2())

Optimization terminated successfully.
         Current function value: 0.528464
         Iterations 6
                         Results: Logit
=================================================================
Model:              Logit            Pseudo R-squared: 0.192     
Dependent Variable: Outcome          AIC:              583.5699  
Date:               2025-01-25 15:00 BIC:              617.8579  
No. Observations:   537              Log-Likelihood:   -283.78   
Df Model:           7                LL-Null:          -351.27   
Df Residuals:       529              LLR p-value:      5.7166e-26
Converged:          1.0000           Scale:            1.0000    
No. Iterations:     6.0000                                       
--------------------------------------------------------------------
        Coef.     Std.Err.       z       P>|z|      [0.025    0.975]
--------------------------------------------------------------------
x1      0.2682      0.1249     2.1466    0.0318     0.0233    0.5130
x2      1.0771      0.1440     7.4802    0.0000     0.7948    1.3593
x3     -0.2326      0.1197    -1.9426    0.0521    -0.4673    0.0021
x4      0.1166      0.1322     0.8820    0.3778    -0.1425    0.3757
x5     -0.1860      0.1272    -1.4620    0.1437    -0.4353    0.0633
x6      0.5976      0.1326     4.5075    0.0000     0.3378    0.8575
x7      0.2719      0.1161     2.3410    0.0192     0.0443    0.4995
x8      0.2450      0.1315     1.8633    0.0624    -0.0127    0.5027
=================================================================

/usr/local/lib/python3.11/site-packages/statsmodels/iolib/summary2.py:579: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
  dat = dat.applymap(lambda x: _formatter(x, float_format))

logreg.predict_proba(X_train_stand)

11- Many problems require a probability estimate as a result. Estimate the probabilities as shows:

In [12]:

logreg.predict_proba(X_train_stand)

Out[12]:

array([[0.41121528, 0.58878472],
       [0.97365877, 0.02634123],
       [0.66085209, 0.33914791],
       ...,
       [0.94046578, 0.05953422],
       [0.85330041, 0.14669959],
       [0.9013497 , 0.0986503 ]])

Determine the Coefficients of the Logistic Regression.

In [13]:

# Access the coefficients
coef = logreg.coef_

Decision boundaries¶

13- Let's plot the decision boundaries of the model.

In [14]:

from mlxtend.plotting import plot_decision_regions
from sklearn import datasets

# Loading some example data
#iris = datasets.load_iris()
X =X_train_stand[:,:2]
y=y_train.to_numpy()
logreg = LogisticRegression(solver='liblinear')
logreg.fit(X, y)

# Plotting decision regions
plot_decision_regions(X ,y, clf=logreg, legend=2)

# Adding axes annotations
plt.xlabel('Pregnancies')
plt.ylabel('Glucose')
plt.title('LogisticRegression')
plt.show()

Activities¶

14- Fit a logistic regression model using the given input and target values. Once fitted, obtain the first coeficient of this model.

15- Fit a logistic regression model using the given input and output patterns X and Y, respectively. Once fitted, determine the accuracy score of the test dataset

c1.fit(X_test,y_test)

In [59]:

from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=1000, centers=2,
                  random_state=49, cluster_std=1.95)

In [60]:

X_test= [[0.77499332, 5.10445441],
       [2.33615249, 3.733497  ],
       [3.53929109, 1.13492994],
       [2.82894249, 4.00864077],
       [4.61800816, 2.39809546]]

y_test=[0, 1, 1, 0, 0]

In [61]:

X, y = make_blobs(n_samples=1000, centers=2,
                  random_state=49, cluster_std=1.95)

clf2 = LogisticRegression(random_state=0)
clf2.fit(X, y) 

print("Accuracy:", clf2.score(X_test, y_test))

Accuracy: 0.4