NumPy – Union
In data analysis, we often need to combine two datasets and remove duplicates.
In NumPy, this is done using the function np.union1d().
It is widely used in:
- Data science
- Machine learning
- Database merging
- Data cleaning
- Set theory operations
What is Union?
Union means:
Combining all unique elements from two arrays.
Import NumPy
import numpy as np
1. Basic Union of Two Arrays
import numpy as np
A = np.array([1, 2, 3, 4])
B = np.array([3, 4, 5, 6])
result = np.union1d(A, B)
print(result)
Output:
[1 2 3 4 5 6]
2. Union with Duplicate Values
import numpy as np
A = np.array([1, 1, 2, 3])
B = np.array([3, 4, 4, 5])
result = np.union1d(A, B)
print(result)
Output:
[1 2 3 4 5]
3. Union of 2D Arrays
NumPy flattens arrays before union operation.
import numpy as np
A = np.array([[1, 2],
[3, 4]])
B = np.array([[4, 5],
[6, 7]])
result = np.union1d(A, B)
print(result)
Output:
[1 2 3 4 5 6 7]
4. Real Dataset Example
import numpy as np
users_A = np.array([101, 102, 103])
users_B = np.array([103, 104, 105])
all_users = np.union1d(users_A, users_B)
print(all_users)
Output:
[101 102 103 104 105]
5. Union vs Concatenation
import numpy as np
A = np.array([1, 2, 3])
B = np.array([3, 4, 5])
print("Union:", np.union1d(A, B))
print("Concatenate:", np.concatenate((A, B)))
Difference:
| Operation | Behavior |
|---|---|
| Union | Removes duplicates |
| Concatenate | Keeps all values |
Set Operations in NumPy
NumPy provides full set operations:
-
Union →
np.union1d() -
Intersection →
np.intersect1d() -
Difference →
np.setdiff1d() -
Symmetric Difference →
np.setxor1d()
Real-World Applications
1. Data Science
- Merging datasets
- Combining feature sets
2. Machine Learning
- Feature union
- Data preprocessing
3. Databases
- Record merging
- Unique ID collection
4. Web Applications
- User data aggregation
- Search indexing
Why Use NumPy Union?
Using NumPy provides:
- Fast set operations
- Automatic duplicate removal
- Efficient large-scale processing
- Clean and readable code
Combined with Python, it becomes powerful for data engineering and analysis.
Summary
NumPy allows efficient union operations using:
np.union1d(A, B)
It automatically removes duplicates and returns sorted unique values.
Conclusion
Union is a fundamental operation in data science for merging datasets. NumPy makes it fast, simple, and highly efficient for real-world applications.


0 Comments