Header Ads Widget

⚡ Premium Tools Hub • EXE Apps + Full Python Source Code
Lite • Pro • Bundle Packs • Instant Download

Logistic Regression in Python – Building Classifier with Scikit-Learn | Step-by-Step Guide

Logistic Regression in Python – Building Classifier

After preparing, cleaning, and splitting the data, the next step in a machine learning workflow is building the classifier. This is where we train the Logistic Regression model so it can learn patterns from the training data and make predictions on unseen data.

In this tutorial, you will learn how to build a Logistic Regression classifier in Python using Scikit-Learn, train it on real data, and understand how it makes predictions.


What is a Classifier?

A classifier is a machine learning model that predicts categorical labels.

For Logistic Regression:

  • Output = Probability
  • Final Result = Class label (0 or 1)

Examples:

InputOutput
Customer DataBuy / Not Buy
EmailSpam / Not Spam
Medical DataDisease / No Disease

How Logistic Regression Classifier Works

Logistic Regression works in three steps:

Step 1: Linear Combination

It combines input features using weights.

Step 2: Sigmoid Function

It converts output into probability:

P = 1 / (1 + e^(-z))

Step 3: Threshold Decision

  • If P ≥ 0.5 → Class 1
  • If P < 0.5 → Class 0

Import Required Libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

Load Dataset

data = pd.read_csv("data/customers.csv")

print(data.head())

Prepare Features and Target

X = data[['Age', 'Salary']]
y = data['Purchased']

Split Data

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.25,
    random_state=42
)

Feature Scaling

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Building Logistic Regression Classifier

Now we create and train the model.

model = LogisticRegression()

model.fit(X_train, y_train)

This step is called training the classifier.

The model learns:

  • Relationship between Age and Purchase
  • Relationship between Salary and Purchase
  • Patterns in customer behavior

Making Predictions

After training, we use the model to predict results.

y_pred = model.predict(X_test)

print(y_pred)

Example output:

[0 1 1 0 1]

Understanding Predictions

  • 0 → Customer will NOT purchase
  • 1 → Customer will purchase

The model applies learned patterns to unseen data.


Predicting Probabilities

Logistic Regression provides probability values.

probs = model.predict_proba(X_test)

print(probs[:5])

Example output:

[[0.82 0.18]
 [0.25 0.75]
 [0.10 0.90]]

Interpretation:

  • First column → Probability of class 0
  • Second column → Probability of class 1

Decision Function

Internally, the model calculates a score before applying sigmoid transformation.

print(model.decision_function(X_test))

Higher values mean higher probability of Class 1.


Evaluating Classifier Performance

Accuracy Score

from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

Confusion Matrix

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

print(cm)

Classification Report

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

Full Classifier Code

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load data
data = pd.read_csv("data/customers.csv")

# Features and target
X = data[['Age', 'Salary']]
y = data['Purchased']

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

# Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Build classifier
model = LogisticRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))

Advantages of Logistic Regression Classifier

  • Simple and easy to implement
  • Fast training process
  • Works well for binary classification
  • Provides probability outputs
  • Highly interpretable results

Limitations

  • Assumes linear decision boundary
  • Not suitable for complex nonlinear data
  • Sensitive to outliers
  • Requires feature scaling

Real-World Applications

Logistic Regression classifiers are used in:

  • Email spam detection
  • Credit card fraud detection
  • Medical diagnosis systems
  • Customer churn prediction
  • Marketing campaign prediction

Best Practices

  • Always scale numerical features
  • Use train-test split properly
  • Check data imbalance
  • Evaluate with multiple metrics
  • Save trained model for reuse

Conclusion

Building a Logistic Regression classifier in Python is a fundamental step in machine learning. It allows you to train a model that can predict categories based on input data and return meaningful probability values.

By using Scikit-Learn, you can quickly build, train, and evaluate classifiers for real-world problems such as customer prediction, fraud detection, and medical diagnosis. Mastering this step completes the core workflow of Logistic Regression modeling.




Post a Comment

0 Comments