Header Ads Widget

⚡ Premium Tools Hub • EXE Apps + Full Python Source Code
Lite • Pro • Bundle Packs • Instant Download

NumPy Imputing Missing Data – Replace NaN Values in Python Arrays

NumPy Imputing Missing Data

Missing data is a common problem in real-world datasets.

Instead of removing missing values, another approach is to impute them.

Imputation means replacing missing values with meaningful substitutes so that valuable data is not lost.


What is Data Imputation?

Data imputation is the process of:

Replacing missing values with estimated or calculated values.

Example:

Before:
[10, NaN, 20, NaN, 30]

After:
[10, 20, 20, 20, 30]

Why Impute Missing Data?

  • Preserve dataset size
  • Prevent data loss
  • Improve machine learning accuracy
  • Handle incomplete records
  • Maintain statistical consistency

1. Creating an Array with Missing Values

import numpy as np

arr = np.array([10, 20, np.nan, 40, np.nan])

print(arr)

Output

[10. 20. nan 40. nan]

2. Replacing Missing Values with Zero

The simplest imputation method.

import numpy as np

arr = np.array([10, 20, np.nan, 40])

imputed = np.nan_to_num(arr, nan=0)

print(imputed)

Output

[10. 20.  0. 40.]

3. Imputing with Mean Value

A common statistical technique.

import numpy as np

arr = np.array([10, 20, np.nan, 40])

mean_value = np.nanmean(arr)

arr[np.isnan(arr)] = mean_value

print(arr)

Output

[10. 20. 23.33333333 40.]

4. Imputing with Median Value

Median is often better when data contains outliers.

import numpy as np

arr = np.array([10, 20, np.nan, 40])

median_value = np.nanmedian(arr)

arr[np.isnan(arr)] = median_value

print(arr)

Output

[10. 20. 20. 40.]

5. Imputing with Maximum Value

import numpy as np

arr = np.array([10, 20, np.nan, 40])

max_value = np.nanmax(arr)

arr[np.isnan(arr)] = max_value

print(arr)

Output

[10. 20. 40. 40.]

6. Imputing with Minimum Value

import numpy as np

arr = np.array([10, 20, np.nan, 40])

min_value = np.nanmin(arr)

arr[np.isnan(arr)] = min_value

print(arr)

Output

[10. 20. 10. 40.]

7. Imputing Missing Values in a 2D Array

import numpy as np

arr = np.array([
[1, 2, np.nan],
[4, np.nan, 6]
])

mean_value = np.nanmean(arr)

arr[np.isnan(arr)] = mean_value

print(arr)

Output

[[1.   2.   3.25]
[4. 3.25 6. ]]

8. Column-Wise Imputation

Replace missing values using the mean of each column.

import numpy as np

arr = np.array([
[10, 20, np.nan],
[40, np.nan, 60],
[70, 80, 90]
])

col_means = np.nanmean(arr, axis=0)

inds = np.where(np.isnan(arr))

arr[inds] = np.take(col_means, inds[1])

print(arr)

9. Real-World Example: Student Scores

scores = np.array([80, 90, np.nan, 70, 85])

avg_score = np.nanmean(scores)

scores[np.isnan(scores)] = avg_score

print(scores)

10. Real-World Example: Temperature Dataset

temps = np.array([30.5, np.nan, 29.0, 31.0])

mean_temp = np.nanmean(temps)

temps[np.isnan(temps)] = mean_temp

print(temps)

Common Imputation Techniques

TechniqueDescription
Zero Imputation             Replace with 0
Mean Imputation             Replace with average
Median Imputation             Replace with median
Maximum Imputation             Replace with max value
Minimum Imputation             Replace with min value

Mean vs Median Imputation

MethodBest Use Case
Mean                    Normally distributed data
Median                    Data with outliers
Zero                    Placeholder values
Max/Min                    Special domain requirements

Advantages of Imputing Missing Data

  • Keeps all records
  • Prevents information loss
  • Improves model performance
  • Maintains dataset structure
  • Easy to implement

Summary

Imputing missing data is an important preprocessing step in data science. NumPy provides powerful functions such as nanmean(), nanmedian(), and nan_to_num() to replace missing values efficiently.

This functionality is part of NumPy and widely used in applications built with Python.


Conclusion

Instead of deleting valuable records, imputation allows you to intelligently replace missing values and preserve your dataset. By using NumPy's built-in tools, you can prepare clean and complete data for analysis, visualization, and machine learning.




Post a Comment

0 Comments