🐍 NumPy – Splitting Arrays
In data analysis and scientific computing, large datasets are often divided into smaller parts for easier processing.
NumPy provides powerful functions for splitting arrays into multiple sub-arrays without manually slicing data.
Array splitting is commonly used in:
- Data preprocessing
- Machine learning
- Batch processing
- Dataset partitioning
- Scientific computing
By mastering array splitting, you can efficiently organize and process large amounts of data.
What is Array Splitting?
Array splitting means dividing a large array into multiple smaller arrays.
Example:
Original array:
[1, 2, 3, 4, 5, 6]Split into 3 parts:
[1, 2]
[3, 4]
[5, 6]NumPy provides several functions for this purpose.
Using np.split()
The split() function divides an array into equal-sized sub-arrays.
Syntax
np.split(array, sections, axis=0)Where:
- array = source array
- sections = number of splits or split indexes
- axis = dimension to split
Splitting a 1D Array
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
result = np.split(arr, 3)
print(result)Output:
[array([1, 2]),
array([3, 4]),
array([5, 6])]The array is divided into three equal parts.
Understanding Equal Division Requirement
split() requires equal division.
Example:
arr = np.array([1, 2, 3, 4, 5])
np.split(arr, 2)Output:
ValueErrorBecause:
5 ÷ 2 = not equalNumPy cannot create equal-sized parts.
Using np.array_split()
The array_split() function can split arrays even when equal division is not possible.
Example
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
result = np.array_split(arr, 2)
print(result)Output:
[array([1, 2, 3]),
array([4, 5])]Notice that NumPy distributes elements automatically.
Splitting Using Index Positions
Instead of specifying the number of sections, you can define split points.
import numpy as np
arr = np.array([10, 20, 30, 40, 50, 60])
result = np.split(arr, [2, 4])
print(result)Output:
[array([10, 20]),
array([30, 40]),
array([50, 60])]Split positions:
Index 2
Index 4Splitting 2D Arrays
Consider the following matrix:
import numpy as np
arr = np.array([
[1, 2],
[3, 4],
[5, 6],
[7, 8]
])Vertical Splitting
result = np.split(arr, 2)
print(result)Output:
[array([[1, 2],
[3, 4]]),
array([[5, 6],
[7, 8]])]Rows are divided into two groups.
Splitting Along Columns
Use axis=1.
import numpy as np
arr = np.array([
[1, 2],
[3, 4],
[5, 6]
])
result = np.split(arr, 2, axis=1)
print(result)Output:
[array([[1],
[3],
[5]]),
array([[2],
[4],
[6]])]Using hsplit()
The hsplit() function performs horizontal splitting.
Example
import numpy as np
arr = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8]
])
result = np.hsplit(arr, 2)
print(result)Output:
[array([[1, 2],
[5, 6]]),
array([[3, 4],
[7, 8]])]Columns are divided.
Using vsplit()
The vsplit() function performs vertical splitting.
Example
import numpy as np
arr = np.array([
[1, 2],
[3, 4],
[5, 6],
[7, 8]
])
result = np.vsplit(arr, 2)
print(result)Output:
[array([[1, 2],
[3, 4]]),
array([[5, 6],
[7, 8]])]Rows are divided.
Using dsplit()
The dsplit() function splits along the third axis.
Example:
import numpy as np
arr = np.arange(16).reshape(2, 2, 4)
result = np.dsplit(arr, 2)
print(result)Output:
Two 3D arraysThis is useful when working with:
- RGB images
- Video frames
- Tensor data
Understanding Array Dimensions
Example:
arr.shapeOutput:
(2, 3)Meaning:
- 2 rows
- 3 columns
Splitting depends on which axis is selected.
Common Splitting Functions
| Function | Purpose |
|---|---|
| split() | Equal splitting |
| array_split() | Unequal splitting |
| hsplit() | Horizontal split |
| vsplit() | Vertical split |
| dsplit() | Depth split |
Split vs Array Split
| Feature | split() | array_split() |
| Equal division required | Yes | No |
| Flexible sizes | No | Yes |
| Error on uneven split | Yes | No |
| Most commonly used | Sometimes | Frequently |
Real-World Example
Splitting a dataset into training and testing sets.
import numpy as np
data = np.arange(100)
train, test = np.split(data, [80])
print(train.shape)
print(test.shape)Output:
(80,)
(20,)This approach is commonly used in machine learning projects.
Image Processing Example
RGB images contain three color channels:
Red
Green
BlueUsing dsplit():
red, green, blue = np.dsplit(image, 3)Each channel becomes a separate array.
Performance Considerations
Splitting arrays is efficient because NumPy often creates views rather than copying data.
Benefits:
- Lower memory usage
- Faster execution
- Better scalability
Best Practices
- Use
array_split()when sizes may not divide evenly. - Use
hsplit()for column operations. - Use
vsplit()for row operations. - Verify array shape before splitting.
- Use views when possible to save memory.
Common Errors
Unequal Split Error
arr = np.array([1,2,3,4,5])
np.split(arr, 2)Output:
ValueError:
array split does not result in an equal divisionSolution:
np.array_split(arr, 2)Summary
NumPy provides several powerful array splitting functions:
- split()
- array_split()
- hsplit()
- vsplit()
- dsplit()
These functions allow you to divide large arrays into smaller, manageable pieces for analysis and processing.
Conclusion
Array splitting is an essential NumPy skill for data manipulation and preprocessing. Whether you're preparing machine learning datasets, handling image channels, or organizing scientific data, NumPy's splitting functions provide efficient and flexible solutions.
Understanding when to use split(), array_split(), hsplit(), vsplit(), and dsplit() will help you write cleaner and more efficient Python code.


0 Comments