Logistic Regression in Python Tutorial

Logistic Regression is one of the most popular machine learning algorithms used for classification problems. Despite its name, Logistic Regression is used to predict categorical outcomes rather than continuous values. It is widely applied in spam detection, customer churn prediction, disease diagnosis, sentiment analysis, and many other classification tasks.

In this tutorial, you will learn the fundamentals of Logistic Regression, how it works, and how to implement it in Python using Scikit-Learn.

What is Logistic Regression?

Logistic Regression is a supervised machine learning algorithm used for predicting discrete classes.

Unlike Linear Regression, which predicts numerical values, Logistic Regression predicts probabilities that can be mapped to class labels.

For example:

Email is Spam or Not Spam
Customer Will Buy or Not Buy
Student Passes or Fails
Disease Positive or Negative

The output probability ranges between 0 and 1.

How Logistic Regression Works

Logistic Regression uses the Sigmoid Function to transform predictions into probabilities.

The sigmoid function is:

$P(y=1)=\frac{1}{1+e^{-z}}$

Where:

P(y=1) = Probability of positive class
e = Euler's number
z = Linear combination of features

If probability > 0.5, the prediction is usually classified as Class 1; otherwise, it is classified as Class 0.

Installing Required Libraries

Install the necessary Python packages:


pip install numpy pandas matplotlib scikit-learn

Importing Required Modules


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

Creating a Sample Dataset

Let's create a simple dataset representing study hours and exam results.


data = {
    'Hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Pass':  [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
}

df = pd.DataFrame(data)

print(df)

Output:


   Hours  Pass
0      1     0
1      2     0
2      3     0
3      4     0
4      5     1
...

Preparing Features and Labels

Features are input variables, while labels are target outputs.


X = df[['Hours']]
y = df['Pass']

Splitting Data into Training and Testing Sets

Machine learning models should be evaluated on unseen data.


X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)

Training the Logistic Regression Model

Create and train the model.


model = LogisticRegression()

model.fit(X_train, y_train)

The model learns the relationship between study hours and exam results.

Making Predictions

Predict class labels for test data.


predictions = model.predict(X_test)

print(predictions)

Example output:


[1 0]

Predicting Probabilities

Logistic Regression can provide probabilities for each class.


probabilities = model.predict_proba(X_test)

print(probabilities)

Example output:


[[0.12 0.88]
 [0.91 0.09]]

Interpretation:

88% chance of passing
9% chance of passing

Evaluating Model Accuracy

Calculate prediction accuracy.


accuracy = accuracy_score(y_test, predictions)

print("Accuracy:", accuracy)

Output:


Accuracy: 1.0

A score of 1.0 means 100% accuracy on the test data.

Complete Logistic Regression Example


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data = {
    'Hours': [1,2,3,4,5,6,7,8,9,10],
    'Pass':  [0,0,0,0,1,1,1,1,1,1]
}

df = pd.DataFrame(data)

X = df[['Hours']]
y = df['Pass']

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42
)

model = LogisticRegression()

model.fit(X_train, y_train)

predictions = model.predict(X_test)

accuracy = accuracy_score(y_test, predictions)

print("Predictions:", predictions)
print("Accuracy:", accuracy)

Using the Iris Dataset

The Iris dataset is a classic machine learning dataset.


from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

iris = load_iris()

X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42
)

model = LogisticRegression(max_iter=200)

model.fit(X_train, y_train)

predictions = model.predict(X_test)

print("Accuracy:",
      accuracy_score(y_test, predictions))

Important Evaluation Metrics

Beyond accuracy, Logistic Regression can be evaluated using:

Precision

Measures how many positive predictions were correct.


from sklearn.metrics import precision_score

Recall

Measures how many actual positives were identified.


from sklearn.metrics import recall_score

F1 Score

Balances precision and recall.


from sklearn.metrics import f1_score

Confusion Matrix

Shows detailed prediction results.


from sklearn.metrics import confusion_matrix

Advantages of Logistic Regression

Easy to understand and implement
Fast training process
Works well on linearly separable data
Provides probability outputs
Suitable for binary and multiclass classification
Requires fewer computational resources

Limitations of Logistic Regression

Assumes linear relationships
Less effective for highly complex data
Sensitive to outliers
Performance decreases with non-linear patterns

Real-World Applications

Logistic Regression is used in:

Email spam detection
Fraud detection
Disease diagnosis
Credit risk assessment
Customer churn prediction
Sentiment analysis
Marketing campaign response prediction

Best Practices

Scale features when necessary.
Remove irrelevant variables.
Handle missing values properly.
Evaluate using multiple metrics.
Use cross-validation for reliable results.
Monitor class imbalance issues.

Conclusion

Logistic Regression is one of the most important classification algorithms in machine learning. It is simple, efficient, interpretable, and widely used in real-world applications. By combining Logistic Regression with Python and Scikit-Learn, developers can quickly build powerful predictive models for binary and multiclass classification tasks. Understanding Logistic Regression provides a strong foundation for learning more advanced machine learning algorithms and building intelligent data-driven applications.

Header Ads Widget

Logistic Regression in Python Tutorial – Classification with Scikit-Learn

Logistic Regression in Python Tutorial

What is Logistic Regression?

How Logistic Regression Works

Installing Required Libraries

Importing Required Modules

Creating a Sample Dataset

Preparing Features and Labels

Splitting Data into Training and Testing Sets

Training the Logistic Regression Model

Making Predictions

Predicting Probabilities

Evaluating Model Accuracy

Complete Logistic Regression Example

Using the Iris Dataset

Important Evaluation Metrics

Precision

Recall

F1 Score

Confusion Matrix

Advantages of Logistic Regression

Limitations of Logistic Regression

Real-World Applications

Best Practices

Conclusion

Posted by: Roger John Williams

You may like these posts

Post a Comment

0 Comments

Search This Blog

Report Abuse

Labels

Subscribe Us

Ad Space

Popular Posts

NumPy Inverse Fourier Transform Explained – Python IFFT with Examples

Python - Join Tuples (Complete Guide for Beginners)

Python - Tuple Methods (Complete Guide for Beginners)

Tags

Popular Posts

NumPy Inverse Fourier Transform Explained – Python IFFT with Examples

Python - Join Tuples (Complete Guide for Beginners)

Python - Tuple Methods (Complete Guide for Beginners)

Labels

Menu Footer Widget