Logistic Regression in Python – Building Classifier
After preparing, cleaning, and splitting the data, the next step in a machine learning workflow is building the classifier. This is where we train the Logistic Regression model so it can learn patterns from the training data and make predictions on unseen data.
In this tutorial, you will learn how to build a Logistic Regression classifier in Python using Scikit-Learn, train it on real data, and understand how it makes predictions.
What is a Classifier?
A classifier is a machine learning model that predicts categorical labels.
For Logistic Regression:
- Output = Probability
- Final Result = Class label (0 or 1)
Examples:
| Input | Output |
|---|---|
| Customer Data | Buy / Not Buy |
| Spam / Not Spam | |
| Medical Data | Disease / No Disease |
How Logistic Regression Classifier Works
Logistic Regression works in three steps:
Step 1: Linear Combination
It combines input features using weights.
Step 2: Sigmoid Function
It converts output into probability:
P = 1 / (1 + e^(-z))Step 3: Threshold Decision
- If P ≥ 0.5 → Class 1
- If P < 0.5 → Class 0
Import Required Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegressionLoad Dataset
data = pd.read_csv("data/customers.csv")
print(data.head())Prepare Features and Target
X = data[['Age', 'Salary']]
y = data['Purchased']Split Data
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.25,
random_state=42
)Feature Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)Building Logistic Regression Classifier
Now we create and train the model.
model = LogisticRegression()
model.fit(X_train, y_train)This step is called training the classifier.
The model learns:
- Relationship between Age and Purchase
- Relationship between Salary and Purchase
- Patterns in customer behavior
Making Predictions
After training, we use the model to predict results.
y_pred = model.predict(X_test)
print(y_pred)Example output:
[0 1 1 0 1]Understanding Predictions
- 0 → Customer will NOT purchase
- 1 → Customer will purchase
The model applies learned patterns to unseen data.
Predicting Probabilities
Logistic Regression provides probability values.
probs = model.predict_proba(X_test)
print(probs[:5])Example output:
[[0.82 0.18]
[0.25 0.75]
[0.10 0.90]]Interpretation:
- First column → Probability of class 0
- Second column → Probability of class 1
Decision Function
Internally, the model calculates a score before applying sigmoid transformation.
print(model.decision_function(X_test))Higher values mean higher probability of Class 1.
Evaluating Classifier Performance
Accuracy Score
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)Classification Report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))Full Classifier Code
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load data
data = pd.read_csv("data/customers.csv")
# Features and target
X = data[['Age', 'Salary']]
y = data['Purchased']
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=42
)
# Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Build classifier
model = LogisticRegression()
model.fit(X_train, y_train)
# Predictions
y_pred = model.predict(X_test)
# Accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))Advantages of Logistic Regression Classifier
- Simple and easy to implement
- Fast training process
- Works well for binary classification
- Provides probability outputs
- Highly interpretable results
Limitations
- Assumes linear decision boundary
- Not suitable for complex nonlinear data
- Sensitive to outliers
- Requires feature scaling
Real-World Applications
Logistic Regression classifiers are used in:
- Email spam detection
- Credit card fraud detection
- Medical diagnosis systems
- Customer churn prediction
- Marketing campaign prediction
Best Practices
- Always scale numerical features
- Use train-test split properly
- Check data imbalance
- Evaluate with multiple metrics
- Save trained model for reuse
Conclusion
Building a Logistic Regression classifier in Python is a fundamental step in machine learning. It allows you to train a model that can predict categories based on input data and return meaningful probability values.
By using Scikit-Learn, you can quickly build, train, and evaluate classifiers for real-world problems such as customer prediction, fraud detection, and medical diagnosis. Mastering this step completes the core workflow of Logistic Regression modeling.


0 Comments