NumPy Identifying Missing Values
In real-world datasets, missing or invalid values are very common.
Before cleaning or processing data, the first step is to identify missing values.
NumPy provides powerful functions to detect:
- NaN (Not a Number)
- Infinite values
- Invalid numerical entries
What Are Missing Values?
Missing values refer to:
Data that is undefined, unavailable, or corrupted.
Example:
[10, 20, NaN, 40, NaN]
Why Identifying Missing Values is Important?
- Prevents calculation errors
- Helps clean datasets
- Essential for machine learning
- Improves data accuracy
- Required before preprocessing
1. Checking NaN Values (np.isnan())
The most common method to detect missing values.
import numpy as np
arr = np.array([1, 2, np.nan, 4, np.nan])
print(np.isnan(arr))
Output
[False False True False True]
2. Counting Missing Values
missing_count = np.sum(np.isnan(arr))
print(missing_count)
3. Checking Infinite Values (np.isfinite())
arr = np.array([1, 2, np.inf, -np.inf, 5])
print(np.isfinite(arr))
Output
[ True True False False True]
4. Identifying Both NaN and Infinite Values
arr = np.array([1, np.nan, np.inf, 4])
invalid = ~np.isfinite(arr)
print(invalid)
5. Finding Indices of Missing Values
arr = np.array([10, np.nan, 30, np.nan, 50])
indices = np.where(np.isnan(arr))
print(indices)
6. Filtering Valid Values Only
arr = np.array([1, np.nan, 3, np.nan, 5])
clean = arr[~np.isnan(arr)]
print(clean)
7. Identifying Missing Values in 2D Arrays
arr = np.array([
[1, 2, np.nan],
[4, np.nan, 6]
])
print(np.isnan(arr))
8. Counting Missing Values in 2D Arrays
missing = np.sum(np.isnan(arr))
print(missing)
9. Real-World Example: Sensor Data
data = np.array([10, 20, np.nan, 40, 50])
print("Missing values:", np.isnan(data))
10. Real-World Example: Temperature Dataset
temps = np.array([30.5, np.nan, 28.0, 29.5])
print(np.isnan(temps))
11. Key Functions for Identifying Missing Values
| Function | Purpose |
|---|---|
| np.isnan() | Detect NaN values |
| np.isfinite() | Check valid numbers |
| np.where() | Find positions |
| np.sum() | Count missing values |
12. Visualization of Missing Values
Raw Data:
[10, NaN, 20, NaN, 30]
Detected:
[False, True, False, True, False]
Advantages of Identifying Missing Values
- Helps clean data early
- Prevents runtime errors
- Improves model accuracy
- Essential for data preprocessing
- Enables better analysis
Summary
NumPy provides simple and powerful tools to identify missing values using functions like isnan(), isfinite(), and where(). These tools are essential for preparing clean datasets in data science workflows.
This functionality is part of NumPy and widely used in applications built with Python.
Conclusion
Identifying missing values is the first and most important step in data cleaning. With NumPy, you can quickly detect NaN and invalid data to ensure accurate analysis and machine learning results.


0 Comments