Header Ads Widget

⚡ Premium Tools Hub • EXE Apps + Full Python Source Code
Lite • Pro • Bundle Packs • Instant Download

NumPy Descriptive Statistics Explained – Python Mean, Variance, Distribution Tutorial

NumPy – Descriptive Statistics 

Descriptive statistics helps us summarize and understand data in a simple way.

It provides insights such as:

  • Central tendency (mean, median, mode idea)
  • Data spread (variance, standard deviation)
  • Distribution shape
  • Minimum and maximum values

Using NumPy, we can easily compute descriptive statistics on large datasets with fast and efficient functions.


What is Descriptive Statistics?

Descriptive statistics is the process of summarizing raw data into meaningful information.

It answers questions like:

  • What is the average value?
  • How spread out is the data?
  • What is the range?
  • Are there extreme values?

Why Descriptive Statistics is Important?

It is used in:

  • Data analysis
  • Machine learning
  • Business intelligence
  • Finance
  • Healthcare
  • Research and surveys

Import NumPy

import numpy as np

1. Mean (Central Tendency)

import numpy as np

data = [10, 20, 30, 40, 50]

mean_value = np.mean(data)

print(mean_value)

Output

30.0

Explanation

Mean represents the average value of the dataset.


2. Median (Middle Value)

import numpy as np

data = [10, 20, 30, 40, 50]

median_value = np.median(data)

print(median_value)

Output

30.0

Explanation

Median is useful when data contains outliers.


3. Standard Deviation (Data Spread)

import numpy as np

data = [10, 20, 30, 40, 50]

std_value = np.std(data)

print(std_value)

Output

14.142135623730951

Explanation

Shows how much data varies from the mean.


4. Variance (Dispersion Measure)

import numpy as np

data = [10, 20, 30, 40, 50]

var_value = np.var(data)

print(var_value)

Output

200.0

Explanation

Variance measures overall data spread.


5. Minimum and Maximum

import numpy as np

data = [10, 20, 30, 40, 50]

print("Min:", np.min(data))
print("Max:", np.max(data))

Output

Min: 10
Max: 50

Explanation

  • Minimum = smallest value
  • Maximum = largest value

6. Range of Data

import numpy as np

data = [10, 20, 30, 40, 50]

data_range = np.max(data) - np.min(data)

print(data_range)

Output

40

Explanation

Range shows the difference between highest and lowest values.


7. Percentiles (Data Distribution)

import numpy as np

data = [10, 20, 30, 40, 50]

print(np.percentile(data, 25))
print(np.percentile(data, 50))
print(np.percentile(data, 75))

Output

20.0
30.0
40.0

Explanation

Percentiles divide data into equal parts:

  • 25% = lower quarter
  • 50% = median
  • 75% = upper quarter

8. Sum and Cumulative Sum

import numpy as np

data = [1, 2, 3, 4, 5]

print(np.sum(data))
print(np.cumsum(data))

Output

15
[ 1 3 6 10 15]

Explanation

  • Sum = total values
  • Cumulative sum = running total

9. Correlation (Relationship Between Data)

import numpy as np

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

print(np.corrcoef(x, y))

Output

[[1. 1.]
[1. 1.]]

Explanation

A value close to 1 means strong positive relationship.


10. Data Distribution Overview (Histogram)

import numpy as np

data = np.random.randn(1000)

hist, bins = np.histogram(data, bins=10)

print(hist)

Explanation

Histogram shows how data is distributed across ranges.


Real-World Applications

1. Data Science

  • Data summarization
  • Feature analysis
  • Data cleaning

2. Machine Learning

  • Feature scaling
  • Model evaluation
  • Data preprocessing

3. Finance

  • Risk analysis
  • Market trends
  • Portfolio evaluation

4. Healthcare

  • Patient data analysis
  • Medical research
  • Diagnosis insights

5. Business Analytics

  • Sales analysis
  • Customer behavior
  • Performance tracking

Common NumPy Descriptive Functions

FunctionPurpose
np.mean()Average
np.median()Middle value
np.std()Standard deviation
np.var()Variance
np.min()Minimum value
np.max()Maximum value
np.percentile()Data distribution
np.corrcoef()Correlation
np.sum()Total
np.cumsum()Running total

Why Use NumPy for Descriptive Statistics?

Using NumPy provides:

  • Fast processing of large datasets
  • Simple statistical functions
  • High-performance computation
  • Easy integration with data tools

Combined with Python, it becomes a powerful environment for data analysis and scientific computing.


Summary

NumPy descriptive statistics includes:

np.mean()
np.median()
np.std()
np.var()
np.min()
np.max()
np.percentile()
np.corrcoef()

These functions help summarize and understand data effectively.


Conclusion

Descriptive statistics is the foundation of data analysis. NumPy provides powerful tools to quickly compute and analyze data summaries, helping developers, analysts, and scientists make better decisions based on data.




Post a Comment

0 Comments