NumPy Handling Missing Data
In real-world datasets, missing or invalid values are very common.
These missing values are often represented as:
-
NaN(Not a Number) -
None(in some cases) - Empty or corrupted values
NumPy provides powerful tools to detect, replace, and handle missing data efficiently.
What is Missing Data?
Missing data means:
A value is not available or cannot be represented correctly.
Example:
[10, 20, NaN, 40, NaN]
Why Handling Missing Data is Important?
- Prevents errors in calculations
- Improves data quality
- Required for machine learning
- Ensures accurate analysis
- Helps clean real-world datasets
1. Creating Missing Data (NaN)
import numpy as np
arr = np.array([1, 2, np.nan, 4, 5])
print(arr)
2. Checking for Missing Values (isnan)
print(np.isnan(arr))
Output
[False False True False False]
3. Removing Missing Values
cleaned = arr[~np.isnan(arr)]
print(cleaned)
4. Replacing Missing Values
arr[np.isnan(arr)] = 0
print(arr)
5. Handling Infinite Values
arr = np.array([1, 2, np.inf, -np.inf, 5])
print(np.isfinite(arr))
6. Replacing Infinite Values
arr[~np.isfinite(arr)] = 0
print(arr)
7. Using genfromtxt() for Missing Data
data = np.genfromtxt("data.csv", delimiter=",", filling_values=0)
print(data)
8. Ignoring NaN in Calculations
arr = np.array([1, 2, np.nan, 4, 5])
print(np.nansum(arr))
print(np.nanmean(arr))
9. Real-World Example: Sensor Data Cleaning
data = np.array([10, 20, np.nan, 40, 50])
data = np.nan_to_num(data, nan=0)
print(data)
10. Real-World Example: Temperature Data
temps = np.array([30.5, np.nan, 28.0, 29.5])
clean = np.nanmean(temps)
print(clean)
11. Common Missing Data Functions
| Function | Purpose |
|---|---|
| np.isnan() | Detect NaN |
| np.isfinite() | Check valid numbers |
| np.nan_to_num() | Replace NaN |
| np.nansum() | Sum ignoring NaN |
| np.nanmean() | Mean ignoring NaN |
12. Visualization of Missing Data
Raw Data → [10, NaN, 20, NaN, 30]
Clean Data → [10, 0, 20, 0, 30]
Advantages of Handling Missing Data
- Improves data accuracy
- Prevents runtime errors
- Essential for ML models
- Clean datasets for analysis
- Better predictions
Summary
NumPy provides powerful tools to detect, remove, and replace missing values using functions like isnan(), nan_to_num(), and nansum(). These tools are essential for real-world data cleaning.
This functionality is part of NumPy and widely used in applications built with Python.
Conclusion
Handling missing data is a critical step in data preprocessing. With NumPy, you can easily clean datasets and ensure accurate analysis for machine learning and data science workflows.


0 Comments