Introduction
A histogram is one of the most important tools in data visualization.
It helps you understand:
- Data distribution
- Frequency of values
- Data patterns
- Outliers
When NumPy is combined with Matplotlib, creating histograms becomes simple and powerful.
What is a Histogram?
A histogram is a graphical representation of data distribution using bars.
Each bar represents:
The number of values that fall within a specific range (called bins)
Why Use Histograms?
Histograms help you:
- Understand data distribution
- Detect skewness
- Find outliers
- Analyze frequency
- Prepare data for machine learning
NumPy + Matplotlib for Histogram
We use:
- NumPy → Generate or manage data
- Matplotlib → Plot histogram
1. Installing Matplotlib
pip install matplotlib
2. Basic Histogram Example
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(1000)
plt.hist(data)
plt.title("Basic Histogram")
plt.show()
3. Histogram with Bins
Bins define how data is grouped.
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(1000)
plt.hist(data, bins=20)
plt.title("Histogram with 20 Bins")
plt.show()
4. Custom Bins Example
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(1000)
bins = [-3, -2, -1, 0, 1, 2, 3]
plt.hist(data, bins=bins)
plt.title("Custom Bins Histogram")
plt.show()
5. Histogram with NumPy Data
import numpy as np
import matplotlib.pyplot as plt
data = np.array([10, 20, 20, 30, 30, 30, 40, 50, 50])
plt.hist(data, bins=5)
plt.title("NumPy Array Histogram")
plt.show()
6. Normal Distribution Histogram
import numpy as np
import matplotlib.pyplot as plt
data = np.random.normal(0, 1, 1000)
plt.hist(data, bins=30)
plt.title("Normal Distribution Histogram")
plt.show()
7. Histogram with Density Curve
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(1000)
plt.hist(data, bins=30, density=True)
plt.title("Density Histogram")
plt.show()
8. Colored Histogram
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(1000)
plt.hist(data, bins=25, color="skyblue")
plt.title("Colored Histogram")
plt.show()
9. Histogram with Edge Color
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(1000)
plt.hist(data, bins=25, edgecolor="black")
plt.title("Histogram with Edges")
plt.show()
10. Multiple Histograms
import numpy as np
import matplotlib.pyplot as plt
data1 = np.random.randn(1000)
data2 = np.random.randn(1000)
plt.hist(data1, bins=30, alpha=0.5)
plt.hist(data2, bins=30, alpha=0.5)
plt.title("Multiple Histograms")
plt.show()
Understanding Histogram Components
| Component | Meaning |
|---|---|
| Data | Input values |
| Bins | Intervals of grouping |
| Frequency | Count of values in each bin |
| Bar Height | Number of occurrences |
Real-World Example: Exam Scores
import numpy as np
import matplotlib.pyplot as plt
scores = np.random.randint(0, 100, 200)
plt.hist(scores, bins=10, edgecolor="black")
plt.title("Exam Score Distribution")
plt.show()
Real-World Example: Website Traffic
import numpy as np
import matplotlib.pyplot as plt
visits = np.random.randint(1, 1000, 500)
plt.hist(visits, bins=20)
plt.title("Website Traffic Distribution")
plt.show()
Advantages of Histograms
- Easy data understanding
- Shows distribution clearly
- Detects patterns
- Identifies outliers
- Useful in ML preprocessing
Summary
Histograms using NumPy and Matplotlib help visualize how data is distributed across different ranges. With simple code, you can analyze large datasets and understand patterns easily.
This functionality is powered by NumPy and visualized using tools built in Python.
Conclusion
Understanding histograms is essential for data analysis and machine learning. NumPy provides data generation, while Matplotlib turns it into meaningful visual insights.


0 Comments