Header Ads Widget

⚡ Premium Tools Hub • EXE Apps + Full Python Source Code
Lite • Pro • Bundle Packs • Instant Download

Logistic Regression in Python Tutorial – Classification with Scikit-Learn

Logistic Regression in Python Tutorial

Logistic Regression is one of the most popular machine learning algorithms used for classification problems. Despite its name, Logistic Regression is used to predict categorical outcomes rather than continuous values. It is widely applied in spam detection, customer churn prediction, disease diagnosis, sentiment analysis, and many other classification tasks.

In this tutorial, you will learn the fundamentals of Logistic Regression, how it works, and how to implement it in Python using Scikit-Learn.


What is Logistic Regression?

Logistic Regression is a supervised machine learning algorithm used for predicting discrete classes.

Unlike Linear Regression, which predicts numerical values, Logistic Regression predicts probabilities that can be mapped to class labels.

For example:

  • Email is Spam or Not Spam
  • Customer Will Buy or Not Buy
  • Student Passes or Fails
  • Disease Positive or Negative

The output probability ranges between 0 and 1.


How Logistic Regression Works

Logistic Regression uses the Sigmoid Function to transform predictions into probabilities.

The sigmoid function is:

P(y=1)=11+ezP(y=1)=\frac{1}{1+e^{-z}}

Where:

  • P(y=1) = Probability of positive class
  • e = Euler's number
  • z = Linear combination of features

If probability > 0.5, the prediction is usually classified as Class 1; otherwise, it is classified as Class 0.


Installing Required Libraries

Install the necessary Python packages:

pip install numpy pandas matplotlib scikit-learn

Importing Required Modules

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

Creating a Sample Dataset

Let's create a simple dataset representing study hours and exam results.

data = {
'Hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Pass': [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
}

df = pd.DataFrame(data)

print(df)

Output:

   Hours  Pass
0 1 0
1 2 0
2 3 0
3 4 0
4 5 1
...

Preparing Features and Labels

Features are input variables, while labels are target outputs.

X = df[['Hours']]
y = df['Pass']

Splitting Data into Training and Testing Sets

Machine learning models should be evaluated on unseen data.

X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.2,
random_state=42
)

Training the Logistic Regression Model

Create and train the model.

model = LogisticRegression()

model.fit(X_train, y_train)

The model learns the relationship between study hours and exam results.


Making Predictions

Predict class labels for test data.

predictions = model.predict(X_test)

print(predictions)

Example output:

[1 0]

Predicting Probabilities

Logistic Regression can provide probabilities for each class.

probabilities = model.predict_proba(X_test)

print(probabilities)

Example output:

[[0.12 0.88]
[0.91 0.09]]

Interpretation:

  • 88% chance of passing
  • 9% chance of passing

Evaluating Model Accuracy

Calculate prediction accuracy.

accuracy = accuracy_score(y_test, predictions)

print("Accuracy:", accuracy)

Output:

Accuracy: 1.0

A score of 1.0 means 100% accuracy on the test data.


Complete Logistic Regression Example

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data = {
'Hours': [1,2,3,4,5,6,7,8,9,10],
'Pass': [0,0,0,0,1,1,1,1,1,1]
}

df = pd.DataFrame(data)

X = df[['Hours']]
y = df['Pass']

X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.2,
random_state=42
)

model = LogisticRegression()

model.fit(X_train, y_train)

predictions = model.predict(X_test)

accuracy = accuracy_score(y_test, predictions)

print("Predictions:", predictions)
print("Accuracy:", accuracy)

Using the Iris Dataset

The Iris dataset is a classic machine learning dataset.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

iris = load_iris()

X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.2,
random_state=42
)

model = LogisticRegression(max_iter=200)

model.fit(X_train, y_train)

predictions = model.predict(X_test)

print("Accuracy:",
accuracy_score(y_test, predictions))

Important Evaluation Metrics

Beyond accuracy, Logistic Regression can be evaluated using:

Precision

Measures how many positive predictions were correct.

from sklearn.metrics import precision_score

Recall

Measures how many actual positives were identified.

from sklearn.metrics import recall_score

F1 Score

Balances precision and recall.

from sklearn.metrics import f1_score

Confusion Matrix

Shows detailed prediction results.

from sklearn.metrics import confusion_matrix

Advantages of Logistic Regression

  • Easy to understand and implement
  • Fast training process
  • Works well on linearly separable data
  • Provides probability outputs
  • Suitable for binary and multiclass classification
  • Requires fewer computational resources

Limitations of Logistic Regression

  • Assumes linear relationships
  • Less effective for highly complex data
  • Sensitive to outliers
  • Performance decreases with non-linear patterns

Real-World Applications

Logistic Regression is used in:

  • Email spam detection
  • Fraud detection
  • Disease diagnosis
  • Credit risk assessment
  • Customer churn prediction
  • Sentiment analysis
  • Marketing campaign response prediction

Best Practices

  • Scale features when necessary.
  • Remove irrelevant variables.
  • Handle missing values properly.
  • Evaluate using multiple metrics.
  • Use cross-validation for reliable results.
  • Monitor class imbalance issues.

Conclusion

Logistic Regression is one of the most important classification algorithms in machine learning. It is simple, efficient, interpretable, and widely used in real-world applications. By combining Logistic Regression with Python and Scikit-Learn, developers can quickly build powerful predictive models for binary and multiclass classification tasks. Understanding Logistic Regression provides a strong foundation for learning more advanced machine learning algorithms and building intelligent data-driven applications.




Post a Comment

0 Comments