Logistic Regression in Python – Testing
After building and training a Logistic Regression classifier, the next crucial step is testing the model. Testing allows us to evaluate how well the model performs on unseen data and whether it is reliable for real-world predictions.
A machine learning model is only useful if it performs well on data it has never seen before. That is why testing is a key part of the machine learning workflow.
In this tutorial, you will learn how to test a Logistic Regression model in Python using Scikit-Learn and interpret the results effectively.
Why Testing is Important
Testing helps to:
- Measure model accuracy
- Detect overfitting or underfitting
- Evaluate real-world performance
- Compare different models
- Improve decision-making
Without testing, we cannot trust model predictions.
Testing Workflow Overview
A typical Logistic Regression testing process includes:
1. Load Trained Model
2. Make Predictions
3. Compare with Actual Values
4. Evaluate Metrics
5. Interpret ResultsStep 1: Load Trained Model
If the model is already trained, you can reuse it.
import joblib
model = joblib.load("models/logistic_model.pkl")Step 2: Prepare Test Data
Ensure test data is properly preprocessed.
import pandas as pd
data = pd.read_csv("data/customers_test.csv")
X_test = data[['Age', 'Salary']]
y_test = data['Purchased']Step 3: Feature Scaling
Use the same scaler used during training.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_test = scaler.fit_transform(X_test)Important: In real projects, you should load the saved scaler instead of refitting.
Step 4: Make Predictions
Use the trained model to predict outcomes.
y_pred = model.predict(X_test)
print(y_pred)Example output:
[0 0 1 1 0 1]Step 5: Compare Predictions
Compare predicted values with actual values.
comparison = pd.DataFrame({
"Actual": y_test,
"Predicted": y_pred
})
print(comparison)Step 6: Calculate Accuracy
Accuracy measures how many predictions are correct.
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)Example output:
Accuracy: 0.88Step 7: Confusion Matrix
The confusion matrix shows detailed prediction results.
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)Example:
[[50 5]
[ 6 39]]Interpretation:
- True Negatives: 50
- False Positives: 5
- False Negatives: 6
- True Positives: 39
Step 8: Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))Output includes:
- Precision
- Recall
- F1-score
- Support
Step 9: Predict Probability
Logistic Regression provides probability values.
y_prob = model.predict_proba(X_test)
print(y_prob[:5])Example:
[[0.80 0.20]
[0.30 0.70]
[0.15 0.85]]Understanding Testing Results
Good Model Indicators:
- High accuracy (>80%)
- Balanced precision and recall
- Low false positives and false negatives
Poor Model Indicators:
- Low accuracy
- High error rates
- Overfitting or underfitting
Visualizing Test Results
import matplotlib.pyplot as plt
plt.scatter(range(len(y_test)), y_test, color='blue', label='Actual')
plt.scatter(range(len(y_pred)), y_pred, color='red', label='Predicted')
plt.title("Actual vs Predicted Results")
plt.legend()
plt.show()Full Testing Code Example
import pandas as pd
import joblib
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# Load model
model = joblib.load("models/logistic_model.pkl")
# Load test data
data = pd.read_csv("data/customers_test.csv")
X_test = data[['Age', 'Salary']]
y_test = data['Purchased']
# Predict
y_pred = model.predict(X_test)
# Evaluation
print("Accuracy:", accuracy_score(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))Best Practices for Testing
- Always use unseen data
- Load saved scaler and model
- Avoid data leakage
- Evaluate multiple metrics
- Test on real-world data
Common Mistakes
Avoid:
- Testing on training data
- Refitting scaler on test set
- Ignoring evaluation metrics
- Relying only on accuracy
- Not saving preprocessing steps
Real-World Applications
Testing Logistic Regression models is essential in:
- Fraud detection systems
- Healthcare prediction models
- Customer behavior analysis
- Credit scoring systems
- Marketing analytics
Conclusion
Testing is a critical step in the machine learning lifecycle. It ensures that your Logistic Regression model performs well on unseen data and can be trusted in real-world applications.
By properly evaluating predictions using accuracy, confusion matrix, and classification reports, you can confidently deploy your model for practical use cases.


0 Comments