NumPy Removing Missing Data
In real-world datasets, missing values are very common.
After identifying missing data, the next important step is to remove or clean it.
NumPy provides simple and powerful techniques to remove missing values such as NaN.
What is Missing Data Removal?
Removing missing data means:
Filtering out invalid or undefined values from an array.
Example:
Before: [10, NaN, 20, NaN, 30]
After: [10, 20, 30]
Why Remove Missing Data?
- Improves data quality
- Prevents calculation errors
- Required for machine learning models
- Ensures accurate analysis
- Cleans real-world datasets
1. Removing NaN Values Using Boolean Indexing
import numpy as np
arr = np.array([1, 2, np.nan, 4, 5])
cleaned = arr[~np.isnan(arr)]
print(cleaned)
Output
[1. 2. 4. 5.]
2. Removing Missing Values in 2D Arrays
arr = np.array([
[1, 2, np.nan],
[4, np.nan, 6]
])
cleaned = arr[~np.isnan(arr).any(axis=1)]
print(cleaned)
3. Removing Rows with Missing Data
arr = np.array([
[1, 2, np.nan],
[4, 5, 6],
[7, np.nan, 9]
])
clean_rows = arr[~np.isnan(arr).any(axis=1)]
print(clean_rows)
Output
[[4. 5. 6.]]
4. Removing Columns with Missing Data
arr = np.array([
[1, 2, np.nan],
[4, 5, 6]
])
clean_cols = arr[:, ~np.isnan(arr).any(axis=0)]
print(clean_cols)
5. Removing Infinite Values
arr = np.array([1, 2, np.inf, -np.inf, 5])
cleaned = arr[np.isfinite(arr)]
print(cleaned)
Output
[1. 2. 5.]
6. Using np.nan_to_num() Instead of Removing
arr = np.array([1, np.nan, 3, np.nan, 5])
cleaned = np.nan_to_num(arr, nan=0)
print(cleaned)
7. Removing Missing Data in Real Dataset
data = np.array([10, 20, np.nan, 40, 50])
clean = data[~np.isnan(data)]
print(clean)
8. Removing Missing Data with Filtering
arr = np.array([5, np.nan, 15, np.nan, 25])
filtered = [x for x in arr if not np.isnan(x)]
print(filtered)
9. Counting After Removal
arr = np.array([1, np.nan, 2, np.nan, 3])
cleaned = arr[~np.isnan(arr)]
print(len(cleaned))
10. Real-World Example: Sensor Data Cleaning
sensor = np.array([100, np.nan, 200, 300, np.nan])
cleaned = sensor[~np.isnan(sensor)]
print(cleaned)
11. Key Methods for Removing Missing Data
| Method | Purpose |
|---|---|
| ~np.isnan() | Remove NaN values |
| np.isfinite() | Remove Inf values |
| np.nan_to_num() | Replace missing values |
| Boolean indexing | Filter arrays |
Advantages of Removing Missing Data
- Clean datasets
- Better model accuracy
- Prevents errors
- Improves analysis quality
- Essential for ML workflows
Summary
NumPy provides simple and efficient ways to remove missing data using boolean indexing and utility functions. These methods are essential for preparing clean datasets in data science.
This functionality is part of NumPy and widely used in applications built with Python.
Conclusion
Removing missing data is a crucial step in data preprocessing. With NumPy, you can easily clean datasets and ensure accurate, reliable results for analysis and machine learning.


0 Comments