Header Ads Widget

⚡ Premium Tools Hub • EXE Apps + Full Python Source Code
Lite • Pro • Bundle Packs • Instant Download

NumPy Removing Missing Data – Clean NaN Values from Arrays in Python

NumPy Removing Missing Data

In real-world datasets, missing values are very common.

After identifying missing data, the next important step is to remove or clean it.

NumPy provides simple and powerful techniques to remove missing values such as NaN.


What is Missing Data Removal?

Removing missing data means:

Filtering out invalid or undefined values from an array.

Example:

Before: [10, NaN, 20, NaN, 30]  
After: [10, 20, 30]

Why Remove Missing Data?

  • Improves data quality
  • Prevents calculation errors
  • Required for machine learning models
  • Ensures accurate analysis
  • Cleans real-world datasets

1. Removing NaN Values Using Boolean Indexing

import numpy as np

arr = np.array([1, 2, np.nan, 4, 5])

cleaned = arr[~np.isnan(arr)]

print(cleaned)

Output

[1. 2. 4. 5.]

2. Removing Missing Values in 2D Arrays

arr = np.array([
[1, 2, np.nan],
[4, np.nan, 6]
])

cleaned = arr[~np.isnan(arr).any(axis=1)]

print(cleaned)

3. Removing Rows with Missing Data

arr = np.array([
[1, 2, np.nan],
[4, 5, 6],
[7, np.nan, 9]
])

clean_rows = arr[~np.isnan(arr).any(axis=1)]

print(clean_rows)

Output

[[4. 5. 6.]]

4. Removing Columns with Missing Data

arr = np.array([
[1, 2, np.nan],
[4, 5, 6]
])

clean_cols = arr[:, ~np.isnan(arr).any(axis=0)]

print(clean_cols)

5. Removing Infinite Values

arr = np.array([1, 2, np.inf, -np.inf, 5])

cleaned = arr[np.isfinite(arr)]

print(cleaned)

Output

[1. 2. 5.]

6. Using np.nan_to_num() Instead of Removing

arr = np.array([1, np.nan, 3, np.nan, 5])

cleaned = np.nan_to_num(arr, nan=0)

print(cleaned)

7. Removing Missing Data in Real Dataset

data = np.array([10, 20, np.nan, 40, 50])

clean = data[~np.isnan(data)]

print(clean)

8. Removing Missing Data with Filtering

arr = np.array([5, np.nan, 15, np.nan, 25])

filtered = [x for x in arr if not np.isnan(x)]

print(filtered)

9. Counting After Removal

arr = np.array([1, np.nan, 2, np.nan, 3])

cleaned = arr[~np.isnan(arr)]

print(len(cleaned))

10. Real-World Example: Sensor Data Cleaning

sensor = np.array([100, np.nan, 200, 300, np.nan])

cleaned = sensor[~np.isnan(sensor)]

print(cleaned)

11. Key Methods for Removing Missing Data

MethodPurpose
~np.isnan()                  Remove NaN values
np.isfinite()                  Remove Inf values
np.nan_to_num()                  Replace missing values
Boolean indexing                  Filter arrays

Advantages of Removing Missing Data

  • Clean datasets
  • Better model accuracy
  • Prevents errors
  • Improves analysis quality
  • Essential for ML workflows

Summary

NumPy provides simple and efficient ways to remove missing data using boolean indexing and utility functions. These methods are essential for preparing clean datasets in data science.

This functionality is part of NumPy and widely used in applications built with Python.


Conclusion

Removing missing data is a crucial step in data preprocessing. With NumPy, you can easily clean datasets and ensure accurate, reliable results for analysis and machine learning.




Post a Comment

0 Comments