NumPy – Statistical Functions
Statistics plays a key role in data analysis, machine learning, and scientific computing. It helps us understand data distribution, trends, and variability.
Using NumPy, we can easily perform statistical calculations on large datasets using simple and fast functions.
Why Statistical Functions Matter
Statistical functions help you:
- Understand data behavior
- Find central tendency
- Measure variability
- Detect outliers
- Make predictions
Import NumPy
import numpy as np
1. Mean (Average)
The mean is the sum of all values divided by the number of values.
import numpy as np
data = [10, 20, 30, 40, 50]
mean_value = np.mean(data)
print(mean_value)
Output
30.0
Explanation
Mean gives the central value of the dataset.
2. Median
The median is the middle value.
import numpy as np
data = [10, 20, 30, 40, 50]
median_value = np.median(data)
print(median_value)
Output
30.0
Explanation
Median is useful when data contains outliers.
3. Standard Deviation
Measures how spread out the data is.
import numpy as np
data = [10, 20, 30, 40, 50]
std_value = np.std(data)
print(std_value)
Output
14.142135623730951
Explanation
Higher standard deviation means more variation in data.
4. Variance
Variance is the square of standard deviation.
import numpy as np
data = [10, 20, 30, 40, 50]
var_value = np.var(data)
print(var_value)
Output
200.0
Explanation
Variance measures data spread.
5. Minimum and Maximum Values
import numpy as np
data = [10, 20, 30, 40, 50]
print(np.min(data))
print(np.max(data))
Output
10
50
Explanation
- Min = smallest value
- Max = largest value
6. Percentiles
Percentiles divide data into 100 parts.
import numpy as np
data = [10, 20, 30, 40, 50]
print(np.percentile(data, 25))
print(np.percentile(data, 50))
print(np.percentile(data, 75))
Output
20.0
30.0
40.0
Explanation
- 25th percentile = lower quarter
- 50th percentile = median
- 75th percentile = upper quarter
7. Range of Data
import numpy as np
data = [10, 20, 30, 40, 50]
data_range = np.max(data) - np.min(data)
print(data_range)
Output
40
Explanation
Range shows spread between smallest and largest values.
8. Sum and Cumulative Sum
import numpy as np
data = [1, 2, 3, 4, 5]
print(np.sum(data))
print(np.cumsum(data))
Output
15
[ 1 3 6 10 15]
Explanation
- Sum = total
- Cumulative sum = running total
9. Correlation
Measures relationship between two datasets.
import numpy as np
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
correlation = np.corrcoef(x, y)
print(correlation)
Output
[[1. 1.]
[1. 1.]]
Explanation
Value close to 1 means strong positive relationship.
10. Histogram Data Analysis
import numpy as np
data = np.random.randn(1000)
hist, bins = np.histogram(data, bins=10)
print(hist)
Explanation
Histogram shows data distribution.
Real-World Applications
1. Data Science
- Data analysis
- Feature engineering
- Data cleaning
2. Machine Learning
- Model evaluation
- Feature scaling
- Data normalization
3. Finance
- Risk analysis
- Market trends
- Portfolio management
4. Healthcare
- Patient data analysis
- Medical research
- Diagnosis support
5. Engineering
- Quality control
- Signal analysis
- System monitoring
Common NumPy Statistical Functions
| Function | Purpose |
|---|---|
| np.mean() | Average |
| np.median() | Middle value |
| np.std() | Standard deviation |
| np.var() | Variance |
| np.min() | Minimum |
| np.max() | Maximum |
| np.percentile() | Percentiles |
| np.corrcoef() | Correlation |
Why Use NumPy for Statistics?
Using NumPy provides:
- Fast computation on large datasets
- Easy syntax for complex operations
- High-performance numerical processing
- Integration with data science tools
Combined with Python, it becomes a powerful statistical computing environment.
Summary
NumPy statistical functions include:
np.mean()
np.median()
np.std()
np.var()
np.min()
np.max()
np.percentile()
np.corrcoef()
These functions help analyze and understand data efficiently.
Conclusion
Statistical functions are essential for understanding and interpreting data. NumPy provides a powerful and simple way to perform statistical analysis, making it a core tool for data science, machine learning, finance, and scientific research.


0 Comments