Header Ads Widget

⚡ Premium Tools Hub • EXE Apps + Full Python Source Code
Lite • Pro • Bundle Packs • Instant Download

Logistic Regression in Python – Testing Model with Scikit-Learn | Evaluation Guide

Logistic Regression in Python – Testing

After building and training a Logistic Regression classifier, the next crucial step is testing the model. Testing allows us to evaluate how well the model performs on unseen data and whether it is reliable for real-world predictions.

A machine learning model is only useful if it performs well on data it has never seen before. That is why testing is a key part of the machine learning workflow.

In this tutorial, you will learn how to test a Logistic Regression model in Python using Scikit-Learn and interpret the results effectively.


Why Testing is Important

Testing helps to:

  • Measure model accuracy
  • Detect overfitting or underfitting
  • Evaluate real-world performance
  • Compare different models
  • Improve decision-making

Without testing, we cannot trust model predictions.


Testing Workflow Overview

A typical Logistic Regression testing process includes:

1. Load Trained Model
2. Make Predictions
3. Compare with Actual Values
4. Evaluate Metrics
5. Interpret Results

Step 1: Load Trained Model

If the model is already trained, you can reuse it.

import joblib

model = joblib.load("models/logistic_model.pkl")

Step 2: Prepare Test Data

Ensure test data is properly preprocessed.

import pandas as pd

data = pd.read_csv("data/customers_test.csv")

X_test = data[['Age', 'Salary']]
y_test = data['Purchased']

Step 3: Feature Scaling

Use the same scaler used during training.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_test = scaler.fit_transform(X_test)

Important: In real projects, you should load the saved scaler instead of refitting.


Step 4: Make Predictions

Use the trained model to predict outcomes.

y_pred = model.predict(X_test)

print(y_pred)

Example output:

[0 0 1 1 0 1]

Step 5: Compare Predictions

Compare predicted values with actual values.

comparison = pd.DataFrame({
    "Actual": y_test,
    "Predicted": y_pred
})

print(comparison)

Step 6: Calculate Accuracy

Accuracy measures how many predictions are correct.

from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

Example output:

Accuracy: 0.88

Step 7: Confusion Matrix

The confusion matrix shows detailed prediction results.

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)

print(cm)

Example:

[[50  5]
 [ 6 39]]

Interpretation:

  • True Negatives: 50
  • False Positives: 5
  • False Negatives: 6
  • True Positives: 39

Step 8: Classification Report

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

Output includes:

  • Precision
  • Recall
  • F1-score
  • Support

Step 9: Predict Probability

Logistic Regression provides probability values.

y_prob = model.predict_proba(X_test)

print(y_prob[:5])

Example:

[[0.80 0.20]
 [0.30 0.70]
 [0.15 0.85]]

Understanding Testing Results

Good Model Indicators:

  • High accuracy (>80%)
  • Balanced precision and recall
  • Low false positives and false negatives

Poor Model Indicators:

  • Low accuracy
  • High error rates
  • Overfitting or underfitting

Visualizing Test Results

import matplotlib.pyplot as plt

plt.scatter(range(len(y_test)), y_test, color='blue', label='Actual')
plt.scatter(range(len(y_pred)), y_pred, color='red', label='Predicted')

plt.title("Actual vs Predicted Results")
plt.legend()
plt.show()

Full Testing Code Example

import pandas as pd
import joblib
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load model
model = joblib.load("models/logistic_model.pkl")

# Load test data
data = pd.read_csv("data/customers_test.csv")

X_test = data[['Age', 'Salary']]
y_test = data['Purchased']

# Predict
y_pred = model.predict(X_test)

# Evaluation
print("Accuracy:", accuracy_score(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Best Practices for Testing

  • Always use unseen data
  • Load saved scaler and model
  • Avoid data leakage
  • Evaluate multiple metrics
  • Test on real-world data

Common Mistakes

Avoid:

  • Testing on training data
  • Refitting scaler on test set
  • Ignoring evaluation metrics
  • Relying only on accuracy
  • Not saving preprocessing steps

Real-World Applications

Testing Logistic Regression models is essential in:

  • Fraud detection systems
  • Healthcare prediction models
  • Customer behavior analysis
  • Credit scoring systems
  • Marketing analytics

Conclusion

Testing is a critical step in the machine learning lifecycle. It ensures that your Logistic Regression model performs well on unseen data and can be trusted in real-world applications.

By properly evaluating predictions using accuracy, confusion matrix, and classification reports, you can confidently deploy your model for practical use cases.




Post a Comment

0 Comments