Header Ads Widget

⚡ Premium Tools Hub • EXE Apps + Full Python Source Code
Lite • Pro • Bundle Packs • Instant Download

NumPy Finding Unique Rows Explained – Remove Duplicate Rows in Arrays

NumPy Finding Unique Rows

When working with large datasets, duplicate rows often appear due to:

  • Data entry errors
  • Data merging operations
  • Database exports
  • Sensor readings
  • Machine learning datasets

Before analyzing data, it's important to remove duplicates and keep only unique records.

NumPy provides an efficient way to accomplish this using:

np.unique()

Finding unique rows is a common task in data cleaning and preprocessing.


What are Unique Rows?

Unique rows are rows that appear only once in a dataset.

Consider this array:

[[1 2]
[3 4]
[1 2]
[5 6]]

The row:

[1 2]

appears twice.

Unique rows are:

[[1 2]
[3 4]
[5 6]]

Why Find Unique Rows?

Finding unique rows helps:

  • Remove duplicate records
  • Clean datasets
  • Reduce storage requirements
  • Improve model accuracy
  • Prepare data for analysis

NumPy Function for Unique Rows

The primary function is:

np.unique()

For rows, use:

np.unique(array, axis=0)

Syntax

np.unique(
array,
axis=0
)

Parameters

ParameterDescription
array     Input array
axis=0     Finds unique rows

Basic Example

import numpy as np

arr = np.array([
[1, 2],
[3, 4],
[1, 2],
[5, 6]
])

unique_rows = np.unique(
arr,
axis=0
)

print(unique_rows)

Output

[[1 2]
[3 4]
[5 6]]

Understanding the Result

Original array:

[1 2]
[3 4]
[1 2]
[5 6]

Duplicate:

[1 2]

Removed result:

[1 2]
[3 4]
[5 6]

Example with Multiple Duplicates

import numpy as np

arr = np.array([
[10, 20],
[30, 40],
[10, 20],
[30, 40],
[50, 60]
])

print(
np.unique(arr, axis=0)
)

Output

[[10 20]
[30 40]
[50 60]]

Finding Unique Rows and Their Counts

Use:

return_counts=True

Example:

import numpy as np

arr = np.array([
[1, 2],
[1, 2],
[3, 4],
[5, 6],
[5, 6]
])

rows, counts = np.unique(
arr,
axis=0,
return_counts=True
)

print(rows)
print(counts)

Output

[[1 2]
[3 4]
[5 6]]

[2 1 2]

Explanation

Row frequencies:

RowCount
[1, 2]2
[3, 4]1
[5, 6]2

Finding Original Row Indices

Use:

return_index=True

Example:

import numpy as np

arr = np.array([
[1, 2],
[3, 4],
[1, 2],
[5, 6]
])

rows, indices = np.unique(
arr,
axis=0,
return_index=True
)

print(rows)
print(indices)

Output

[[1 2]
[3 4]
[5 6]]

[0 1 3]

Finding Inverse Mapping

Use:

return_inverse=True

Example:

import numpy as np

arr = np.array([
[1, 2],
[3, 4],
[1, 2]
])

rows, inverse = np.unique(
arr,
axis=0,
return_inverse=True
)

print(rows)
print(inverse)

Output

[[1 2]
[3 4]]

[0 1 0]

Real-World Example: Customer Database

import numpy as np

customers = np.array([
[101, 25],
[102, 30],
[101, 25],
[103, 28]
])

unique_customers = np.unique(
customers,
axis=0
)

print(unique_customers)

Output

[[101 25]
[102 30]
[103 28]]

Real-World Example: Sensor Data

import numpy as np

sensor = np.array([
[100, 200],
[100, 200],
[150, 250]
])

print(
np.unique(sensor, axis=0)
)

Output

[[100 200]
[150 250]]

Unique Rows vs Unique Elements

Unique Elements

np.unique(arr)

Returns:

Individual unique values

Unique Rows

np.unique(arr, axis=0)

Returns:

Entire unique rows

Performance Benefits

NumPy's implementation is:

  • Highly optimized
  • Memory efficient
  • Faster than Python loops
  • Suitable for large datasets

Practical Applications

Finding unique rows is used in:

  • Data cleaning
  • Machine learning preprocessing
  • Database management
  • Log analysis
  • Customer records
  • Financial datasets
  • Scientific research

Advantages of Finding Unique Rows

  • Removes duplicate records
  • Improves data quality
  • Saves storage space
  • Speeds up analysis
  • Simplifies preprocessing

Summary

Finding unique rows in NumPy is simple using np.unique() with axis=0. This technique removes duplicate rows while preserving only distinct records, making it an essential tool for data cleaning and preprocessing.

This functionality is provided by NumPy and is widely used in data science workflows built with Python.


Conclusion

Understanding how to find unique rows is an important skill for working with real-world datasets. Whether you're cleaning customer records, preparing machine learning data, or analyzing scientific measurements, np.unique(axis=0) provides a fast and efficient solution.




Post a Comment

0 Comments