NumPy – Descriptive Statistics
Descriptive statistics helps us summarize and understand data in a simple way.
It provides insights such as:
- Central tendency (mean, median, mode idea)
- Data spread (variance, standard deviation)
- Distribution shape
- Minimum and maximum values
Using NumPy, we can easily compute descriptive statistics on large datasets with fast and efficient functions.
What is Descriptive Statistics?
Descriptive statistics is the process of summarizing raw data into meaningful information.
It answers questions like:
- What is the average value?
- How spread out is the data?
- What is the range?
- Are there extreme values?
Why Descriptive Statistics is Important?
It is used in:
- Data analysis
- Machine learning
- Business intelligence
- Finance
- Healthcare
- Research and surveys
Import NumPy
import numpy as np
1. Mean (Central Tendency)
import numpy as np
data = [10, 20, 30, 40, 50]
mean_value = np.mean(data)
print(mean_value)
Output
30.0
Explanation
Mean represents the average value of the dataset.
2. Median (Middle Value)
import numpy as np
data = [10, 20, 30, 40, 50]
median_value = np.median(data)
print(median_value)
Output
30.0
Explanation
Median is useful when data contains outliers.
3. Standard Deviation (Data Spread)
import numpy as np
data = [10, 20, 30, 40, 50]
std_value = np.std(data)
print(std_value)
Output
14.142135623730951
Explanation
Shows how much data varies from the mean.
4. Variance (Dispersion Measure)
import numpy as np
data = [10, 20, 30, 40, 50]
var_value = np.var(data)
print(var_value)
Output
200.0
Explanation
Variance measures overall data spread.
5. Minimum and Maximum
import numpy as np
data = [10, 20, 30, 40, 50]
print("Min:", np.min(data))
print("Max:", np.max(data))
Output
Min: 10
Max: 50
Explanation
- Minimum = smallest value
- Maximum = largest value
6. Range of Data
import numpy as np
data = [10, 20, 30, 40, 50]
data_range = np.max(data) - np.min(data)
print(data_range)
Output
40
Explanation
Range shows the difference between highest and lowest values.
7. Percentiles (Data Distribution)
import numpy as np
data = [10, 20, 30, 40, 50]
print(np.percentile(data, 25))
print(np.percentile(data, 50))
print(np.percentile(data, 75))
Output
20.0
30.0
40.0
Explanation
Percentiles divide data into equal parts:
- 25% = lower quarter
- 50% = median
- 75% = upper quarter
8. Sum and Cumulative Sum
import numpy as np
data = [1, 2, 3, 4, 5]
print(np.sum(data))
print(np.cumsum(data))
Output
15
[ 1 3 6 10 15]
Explanation
- Sum = total values
- Cumulative sum = running total
9. Correlation (Relationship Between Data)
import numpy as np
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
print(np.corrcoef(x, y))
Output
[[1. 1.]
[1. 1.]]
Explanation
A value close to 1 means strong positive relationship.
10. Data Distribution Overview (Histogram)
import numpy as np
data = np.random.randn(1000)
hist, bins = np.histogram(data, bins=10)
print(hist)
Explanation
Histogram shows how data is distributed across ranges.
Real-World Applications
1. Data Science
- Data summarization
- Feature analysis
- Data cleaning
2. Machine Learning
- Feature scaling
- Model evaluation
- Data preprocessing
3. Finance
- Risk analysis
- Market trends
- Portfolio evaluation
4. Healthcare
- Patient data analysis
- Medical research
- Diagnosis insights
5. Business Analytics
- Sales analysis
- Customer behavior
- Performance tracking
Common NumPy Descriptive Functions
| Function | Purpose |
|---|---|
| np.mean() | Average |
| np.median() | Middle value |
| np.std() | Standard deviation |
| np.var() | Variance |
| np.min() | Minimum value |
| np.max() | Maximum value |
| np.percentile() | Data distribution |
| np.corrcoef() | Correlation |
| np.sum() | Total |
| np.cumsum() | Running total |
Why Use NumPy for Descriptive Statistics?
Using NumPy provides:
- Fast processing of large datasets
- Simple statistical functions
- High-performance computation
- Easy integration with data tools
Combined with Python, it becomes a powerful environment for data analysis and scientific computing.
Summary
NumPy descriptive statistics includes:
np.mean()
np.median()
np.std()
np.var()
np.min()
np.max()
np.percentile()
np.corrcoef()
These functions help summarize and understand data effectively.
Conclusion
Descriptive statistics is the foundation of data analysis. NumPy provides powerful tools to quickly compute and analyze data summaries, helping developers, analysts, and scientists make better decisions based on data.


0 Comments