NumPy Finding Unique Rows

When working with large datasets, duplicate rows often appear due to:

Data entry errors
Data merging operations
Database exports
Sensor readings
Machine learning datasets

Before analyzing data, it's important to remove duplicates and keep only unique records.

NumPy provides an efficient way to accomplish this using:

np.unique()

Finding unique rows is a common task in data cleaning and preprocessing.

What are Unique Rows?

Unique rows are rows that appear only once in a dataset.

Consider this array:


[[1 2]
 [3 4]
 [1 2]
 [5 6]]

The row:


[1 2]

appears twice.

Unique rows are:


[[1 2]
 [3 4]
 [5 6]]

Why Find Unique Rows?

Finding unique rows helps:

Remove duplicate records
Clean datasets
Reduce storage requirements
Improve model accuracy
Prepare data for analysis

NumPy Function for Unique Rows

The primary function is:


np.unique()

For rows, use:


np.unique(array, axis=0)

Syntax


np.unique(
    array,
    axis=0
)

Parameters

Parameter	Description
array	Input array
axis=0	Finds unique rows

Basic Example


import numpy as np

arr = np.array([
    [1, 2],
    [3, 4],
    [1, 2],
    [5, 6]
])

unique_rows = np.unique(
    arr,
    axis=0
)

print(unique_rows)

Output


[[1 2]
 [3 4]
 [5 6]]

Understanding the Result

Original array:


[1 2]
[3 4]
[1 2]
[5 6]

Duplicate:


[1 2]

Removed result:


[1 2]
[3 4]
[5 6]

Example with Multiple Duplicates


import numpy as np

arr = np.array([
    [10, 20],
    [30, 40],
    [10, 20],
    [30, 40],
    [50, 60]
])

print(
    np.unique(arr, axis=0)
)

Output


[[10 20]
 [30 40]
 [50 60]]

Finding Unique Rows and Their Counts

Use:


return_counts=True

Example:


import numpy as np

arr = np.array([
    [1, 2],
    [1, 2],
    [3, 4],
    [5, 6],
    [5, 6]
])

rows, counts = np.unique(
    arr,
    axis=0,
    return_counts=True
)

print(rows)
print(counts)

Output


[[1 2]
 [3 4]
 [5 6]]

[2 1 2]

Explanation

Row frequencies:

Row	Count
[1, 2]	2
[3, 4]	1
[5, 6]	2

Finding Original Row Indices

Use:


return_index=True

Example:


import numpy as np

arr = np.array([
    [1, 2],
    [3, 4],
    [1, 2],
    [5, 6]
])

rows, indices = np.unique(
    arr,
    axis=0,
    return_index=True
)

print(rows)
print(indices)

Output


[[1 2]
 [3 4]
 [5 6]]

[0 1 3]

Finding Inverse Mapping

Use:


return_inverse=True

Example:


import numpy as np

arr = np.array([
    [1, 2],
    [3, 4],
    [1, 2]
])

rows, inverse = np.unique(
    arr,
    axis=0,
    return_inverse=True
)

print(rows)
print(inverse)

Output


[[1 2]
 [3 4]]

[0 1 0]

Real-World Example: Customer Database


import numpy as np

customers = np.array([
    [101, 25],
    [102, 30],
    [101, 25],
    [103, 28]
])

unique_customers = np.unique(
    customers,
    axis=0
)

print(unique_customers)

Output


[[101 25]
 [102 30]
 [103 28]]

Real-World Example: Sensor Data


import numpy as np

sensor = np.array([
    [100, 200],
    [100, 200],
    [150, 250]
])

print(
    np.unique(sensor, axis=0)
)

Output


[[100 200]
 [150 250]]

Unique Rows vs Unique Elements

Unique Elements


np.unique(arr)

Returns:


Individual unique values

Unique Rows


np.unique(arr, axis=0)

Returns:


Entire unique rows

Performance Benefits

NumPy's implementation is:

Highly optimized
Memory efficient
Faster than Python loops
Suitable for large datasets

Practical Applications

Finding unique rows is used in:

Data cleaning
Machine learning preprocessing
Database management
Log analysis
Customer records
Financial datasets
Scientific research

Advantages of Finding Unique Rows

Removes duplicate records
Improves data quality
Saves storage space
Speeds up analysis
Simplifies preprocessing

Summary

Finding unique rows in NumPy is simple using np.unique() with axis=0. This technique removes duplicate rows while preserving only distinct records, making it an essential tool for data cleaning and preprocessing.

This functionality is provided by NumPy and is widely used in data science workflows built with Python.

Conclusion

Understanding how to find unique rows is an important skill for working with real-world datasets. Whether you're cleaning customer records, preparing machine learning data, or analyzing scientific measurements, np.unique(axis=0) provides a fast and efficient solution.

Header Ads Widget

NumPy Finding Unique Rows Explained – Remove Duplicate Rows in Arrays

NumPy Finding Unique Rows

What are Unique Rows?

Why Find Unique Rows?

NumPy Function for Unique Rows

Syntax

Parameters

Basic Example

Output

Understanding the Result

Example with Multiple Duplicates

Output

Finding Unique Rows and Their Counts

Output

Explanation

Finding Original Row Indices

Output

Finding Inverse Mapping

Output

Real-World Example: Customer Database

Output

Real-World Example: Sensor Data

Output

Unique Rows vs Unique Elements

Unique Elements

Unique Rows

Performance Benefits

Practical Applications

Advantages of Finding Unique Rows

Summary

Conclusion

Posted by: Roger John Williams

You may like these posts

Post a Comment

0 Comments

Search This Blog

Report Abuse

Labels

Subscribe Us

Ad Space

Popular Posts

NumPy Inverse Fourier Transform Explained – Python IFFT with Examples

Python - Join Tuples (Complete Guide for Beginners)

Python - Tuple Methods (Complete Guide for Beginners)

Tags

Popular Posts

NumPy Inverse Fourier Transform Explained – Python IFFT with Examples

Python - Join Tuples (Complete Guide for Beginners)

Python - Tuple Methods (Complete Guide for Beginners)

Labels

Menu Footer Widget