NumPy Imputing Missing Data

Missing data is a common problem in real-world datasets.

Instead of removing missing values, another approach is to impute them.

Imputation means replacing missing values with meaningful substitutes so that valuable data is not lost.

What is Data Imputation?

Data imputation is the process of:

Replacing missing values with estimated or calculated values.

Example:


Before:
[10, NaN, 20, NaN, 30]

After:
[10, 20, 20, 20, 30]

Why Impute Missing Data?

Preserve dataset size
Prevent data loss
Improve machine learning accuracy
Handle incomplete records
Maintain statistical consistency

1. Creating an Array with Missing Values


import numpy as np

arr = np.array([10, 20, np.nan, 40, np.nan])

print(arr)

Output


[10. 20. nan 40. nan]

2. Replacing Missing Values with Zero

The simplest imputation method.


import numpy as np

arr = np.array([10, 20, np.nan, 40])

imputed = np.nan_to_num(arr, nan=0)

print(imputed)

Output


[10. 20.  0. 40.]

3. Imputing with Mean Value

A common statistical technique.


import numpy as np

arr = np.array([10, 20, np.nan, 40])

mean_value = np.nanmean(arr)

arr[np.isnan(arr)] = mean_value

print(arr)

Output


[10. 20. 23.33333333 40.]

4. Imputing with Median Value

Median is often better when data contains outliers.


import numpy as np

arr = np.array([10, 20, np.nan, 40])

median_value = np.nanmedian(arr)

arr[np.isnan(arr)] = median_value

print(arr)

Output


[10. 20. 20. 40.]

5. Imputing with Maximum Value


import numpy as np

arr = np.array([10, 20, np.nan, 40])

max_value = np.nanmax(arr)

arr[np.isnan(arr)] = max_value

print(arr)

Output


[10. 20. 40. 40.]

6. Imputing with Minimum Value


import numpy as np

arr = np.array([10, 20, np.nan, 40])

min_value = np.nanmin(arr)

arr[np.isnan(arr)] = min_value

print(arr)

Output


[10. 20. 10. 40.]

7. Imputing Missing Values in a 2D Array


import numpy as np

arr = np.array([
    [1, 2, np.nan],
    [4, np.nan, 6]
])

mean_value = np.nanmean(arr)

arr[np.isnan(arr)] = mean_value

print(arr)

Output


[[1.   2.   3.25]
 [4.   3.25 6.  ]]

8. Column-Wise Imputation

Replace missing values using the mean of each column.


import numpy as np

arr = np.array([
    [10, 20, np.nan],
    [40, np.nan, 60],
    [70, 80, 90]
])

col_means = np.nanmean(arr, axis=0)

inds = np.where(np.isnan(arr))

arr[inds] = np.take(col_means, inds[1])

print(arr)

9. Real-World Example: Student Scores


scores = np.array([80, 90, np.nan, 70, 85])

avg_score = np.nanmean(scores)

scores[np.isnan(scores)] = avg_score

print(scores)

10. Real-World Example: Temperature Dataset


temps = np.array([30.5, np.nan, 29.0, 31.0])

mean_temp = np.nanmean(temps)

temps[np.isnan(temps)] = mean_temp

print(temps)

Common Imputation Techniques

Technique	Description
Zero Imputation	Replace with 0
Mean Imputation	Replace with average
Median Imputation	Replace with median
Maximum Imputation	Replace with max value
Minimum Imputation	Replace with min value

Mean vs Median Imputation

Method	Best Use Case
Mean	Normally distributed data
Median	Data with outliers
Zero	Placeholder values
Max/Min	Special domain requirements

Advantages of Imputing Missing Data

Keeps all records
Prevents information loss
Improves model performance
Maintains dataset structure
Easy to implement

Summary

Imputing missing data is an important preprocessing step in data science. NumPy provides powerful functions such as nanmean(), nanmedian(), and nan_to_num() to replace missing values efficiently.

This functionality is part of NumPy and widely used in applications built with Python.

Conclusion

Instead of deleting valuable records, imputation allows you to intelligently replace missing values and preserve your dataset. By using NumPy's built-in tools, you can prepare clean and complete data for analysis, visualization, and machine learning.

Header Ads Widget

NumPy Imputing Missing Data – Replace NaN Values in Python Arrays

NumPy Imputing Missing Data

What is Data Imputation?

Why Impute Missing Data?

1. Creating an Array with Missing Values

Output

2. Replacing Missing Values with Zero

Output

3. Imputing with Mean Value

Output

4. Imputing with Median Value

Output

5. Imputing with Maximum Value

Output

6. Imputing with Minimum Value

Output

7. Imputing Missing Values in a 2D Array

Output

8. Column-Wise Imputation

9. Real-World Example: Student Scores

10. Real-World Example: Temperature Dataset

Common Imputation Techniques

Mean vs Median Imputation

Advantages of Imputing Missing Data

Summary

Conclusion

Posted by: Roger John Williams

You may like these posts

Post a Comment

0 Comments

Search This Blog

Report Abuse

Labels

Subscribe Us

Ad Space

Popular Posts

NumPy Inverse Fourier Transform Explained – Python IFFT with Examples

Python - Join Tuples (Complete Guide for Beginners)

Python - Tuple Methods (Complete Guide for Beginners)

Tags

Popular Posts

NumPy Inverse Fourier Transform Explained – Python IFFT with Examples

Python - Join Tuples (Complete Guide for Beginners)

Python - Tuple Methods (Complete Guide for Beginners)

Labels

Menu Footer Widget