## required packages/modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import rcParams
from matplotlib import patches
from IPython.display import display, HTML

## default font style
rcParams["font.family"] = "serif"

## format output
CSS = """
.output {
  margin-left:20;
}
"""

HTML('<style>{}</style>'.format(CSS))

Overview

One of the most effective mechanisms for presenting data is through graphs and charts.
Through graphs and charts, the decision-makers can often get an overall picture of the data and reach some useful conclusions merely by studying the chart or graph.
We classify data graphs as quantitative or qualitative.
Quantitative data graphs are plotted along a numerical scale.
Qualitative data graphs are plotted using non-numerical categories.

What is a Histogram?

One of the more widely used types of graphs for quantitative data is the histogram.
A histogram is a series of contiguous bars or rectangles that represents the frequency of data in given class intervals.

Construction

The first step is to locate the class boundaries on the x-axis (horizontal axis) and frequencies on the y-axis (vertical axis).
And then construct a vertical rectangle on each line segment representing a class interval such that the height of the rectangle represents the frequency of the class interval.

Example

Let's we have the following frequency distribution:

Class Intervals	Frequency
30.0 - under 40.0	1
40.0 - under 50.0	0
50.0 - under 60.0	5
60.0 - under 70.0	4
70.0 - under 80.0	15
80.0 - under 90.0	5
90.0 - under 100.0	7

Our first step is to locate class boundaries and frequencies.

def create_axis(
    xticks, yticks, xlim, ylim, 
    xlabel="Class Interval",
    ylabel="Frequency"
):
    """
    Function to create axis.
    
    Args:
        xticks (numpy.array): xtick values.
        yticks (numpy.array): ytick values.
        xlim (tuple): x-limit.
        ylim (tuple): y-limit.
        xlabel (str, optional): X label value.
        ylabel (str, optional): y label value.
    
    Returns:
        figure.Figure: figure object.
        axes.Axes: axes object.
    """
    ## create subplot
    fig, ax = plt.subplots(facecolor="#121212", figsize=(12,8))
    ax.set_facecolor("#121212")

    ## hide the all the spines
    ax.spines["right"].set_visible(False)
    ax.spines["top"].set_visible(False)

    ## change color
    ax.spines['bottom'].set_color("#F2F2F2")
    ax.spines['left'].set_color("#F2F2F2") 

    ## change color of tick params
    ax.tick_params(axis='x', colors="#F2F2F2")
    ax.tick_params(axis='y', colors="#F2F2F2")

    ## set ticks
    ax.set_xticks(np.round(xticks, 2))
    ax.set_yticks(np.round(yticks, 2))

    ## set labels
    ax.set_xlabel(xlabel, color="#F2F2F2", size=20)
    ax.set_ylabel(ylabel, color="#F2F2F2", size=20)

    ## setting the limit
    ax.set(xlim=xlim, ylim=ylim)

    ## credits
    fig.text(
        0.9, 0.02, "graphic: @slothfulwave612", 
        fontsize=10, fontstyle="italic", color="#F2F2F2",
        ha="right", va="center"
    )
    
    return fig, ax

fig, ax = create_axis(
    xticks=np.linspace(30,100,8), yticks=np.linspace(0,15,16), xlim=(30,101), ylim=(0,15)
)
plt.show()

Now, as we have class-boundaries and frequencies listed, now for each class interval we will plot the histogram.
So for our first class-interval, the frequency is 1. So the bar length (in the vertical direction) will touch 1 mark on the y-axis, just like this:

## plot first bin
ax.hist(
    x=[33], bins=[30,40], edgecolor="#F2F2F2", linewidth=1, color="#121212", hatch=1*"/"
)
fig

For our second class-interval (i.e. 40 - under 50) so no bar will be made.
For our third class-interval (i.e. 50 - under 60), the frequency is 5. So the bar length (in the vertical direction) will touch 5 mark on the y-axis, just like this:

## plot third bin
ax.hist(
    x=[52, 55, 53, 58, 57], bins=[50,60], edgecolor="#F2F2F2", 
    linewidth=1, color="#121212", hatch=1*'/'
)
fig

And just like this the process will continue till the whole frequency-table is plotted.

## all test scores
test_scores = [
    52, 92, 84, 74, 65, 55, 78, 95, 62, 72, 64, 
    74, 82, 94, 71, 79, 73, 94, 77, 53, 
    77, 87, 97, 57, 72, 89, 76, 91, 86, 
    99, 71, 73, 58, 76, 33, 78, 69
]

## create new plot
fig, ax = create_axis(
    xticks=np.linspace(30,100,8), yticks=np.linspace(0,15,16), xlim=(30,101), ylim=(0,15)
)

## plot whole histogram
ax.hist(
    x=test_scores, bins=np.linspace(30,100,8),
    edgecolor="#F2F2F2", linewidth=1, color="#121212", hatch=1*"/"
)

plt.show()

A histogram is a useful tool for differentiating the frequencies of class intervals. A glance at a histogram reveals which class intervals produce the highest frequency totals.
- The above figure clearly shows that the class interval 70 - under 80 yields by far the highest frequency count (15)
Examination of the histogram reveals where large increases or decreases occur between classes.
- Such as, from 40 - under 50 class to the 50 - under 60 class, an increase of 5, from 60 - under 70 class to the 70 - under 80 class, an increase of 11, and from 70 - under 80 class to the 80 - under 90 class, a decrease of 10.
  Note: If you use different scales for the x-axis and y-axis, the resultant histograms will look different from the one plotted above. An example below:

## create new plot
fig, ax = create_axis(
    xticks=np.linspace(30, 100, 11), yticks=np.linspace(0, 15, 4), xlim=(30,101), ylim=(0,15)
)

## plot whole histogram
ax.hist(
    x=test_scores, bins=np.linspace(30, 100, 11),
    edgecolor="#F2F2F2", linewidth=1, color="#121212", hatch=1*"/"
)

plt.show()

Note: It is important that the user of the graph clearly understands the scales used for the axes of a histogram. Otherwise, a graph’s creator can “lie with statistics” by stretching or compressing a graph to make a point.

Histograms with non-uniform widths

The histograms we plotted above have equal class-widths.
Now the question arises what if the class-widths are unequal? How to create histograms with unequal class-widths? This section answers this question.
Let's first take an example to see what happens if we plot a histogram (with unequal class widths) same as the way we make a histogram (with equal class widths).
Suppose our data looks like this:

Class Interval	Frequency
0 - under 10	10
10 - under 20	20
20 - under 40	30

Here the class width for the third class-interval is not equal to the rest.
So, if we drew it in the same way, the final histogram will look like this:

data = [
    8, 6, 0, 4, 5, 3, 2, 4, 3, 5,
    10, 10, 17, 16, 13, 12, 18, 16, 10, 14, 18, 14, 14, 15, 15, 11, 16, 17, 10, 13,
    32, 39, 39, 30, 30, 23, 27, 37, 25, 23, 34, 38, 26, 28, 23, 39, 28,
    38, 20, 39, 20, 31, 29, 37, 38, 26, 20, 20, 21, 37
]

# create new plot
fig, ax = create_axis(
    xticks=np.linspace(0, 40, 5), yticks=np.linspace(0, 30, 4), xlim=(0,41), ylim=(0,31)
)

# plot whole histogram
ax.hist(
    x=data, bins=[0, 10, 20, 40],
    edgecolor="#F2F2F2", linewidth=1, color="#121212"
)

# annotate bars
ax.text(
    5, 5, "bar 1", size=15, ha="center", va="center", color="#F2F2F2"
)
ax.text(
    15, 10, "bar 2", size=15, ha="center", va="center", color="#F2F2F2"
)
ax.text(
    30, 15, "bar 3", size=15, ha="center", va="center", color="#F2F2F2"
)


plt.show()

The problem here is that this is not a good way of representing the given data because it doesn't look right.
To show why the representation is not right, let us draw some lines and annotate the size of resulting rectangles.

# create new plot
fig, ax = create_axis(
    xticks=np.linspace(0, 40, 5), yticks=np.linspace(0, 30, 4), xlim=(0,41), ylim=(0,31)
)

# plot whole histogram
ax.hist(
    x=data, bins=[0, 10, 20, 40],
    edgecolor="#F2F2F2", linewidth=1, color="#121212"
)

ax.plot(
    [30, 30], [0, 20], lw=1, color="#F2F2F2", ls="--"
)
ax.plot(
    [20, 40], [20, 20], lw=1, color="#F2F2F2", ls="--"
)

a1 = patches.FancyArrowPatch(
    (20,21), (40,21), 
    arrowstyle="<|-|>,head_length=5,head_width=5", color="#F2F2F2", alpha=0.7
)
ax.add_patch(a1)

a1 = patches.FancyArrowPatch(
    (29,0), (29,20), 
    arrowstyle="<|-|>,head_length=5,head_width=5", color="#F2F2F2", alpha=0.7
)
ax.add_patch(a1)

ax.text(
    28.2, 11, "20 units", color="#F2F2F2", size=12, rotation=90,
    va="top"
)
ax.text(
    30, 21.5, "20 units", color="#F2F2F2", size=12, ha="center"
)

ax.text(
    30, 25, "Same area as bar 2", size=15, ha="center", color="#F2F2F2"
)
ax.text(
    35, 10, "Same area\nas\nbar 2", size=15, ha="center", color="#F2F2F2"
)
ax.text(
    25, 10, "Same area\nas\nbar 2", size=15, ha="center", color="#F2F2F2"
)
ax.text(
    15, 10, "bar 2", size=15, ha="center", color="#F2F2F2"
)

plt.show()

After breaking down bar 3 we can see that the area of bar 3 is three-times the area of bar 2.
That means, bar 3 will represent a number which will be three-times the number represented by bar 2.
bar 2 is representing 20, so that means bar 3 should represent a number 60 as frequency. But that's not the case. bar 3 represents the frequency as 30. (which is not three-times the number 20)
So, this is not the way we represent a histogram with class-intervals having unequal widths.
To solve this issue, we calculate frequency density and is calculated by the following equation:

$Frequency Density = \frac{Frequency}{Class Width}$
Frequency Density: It gives the frequency per unit for the data in this class, where the unit is the unit of measurement of the data.
So, when we add the frequency density column in our given data, the resulting data set will now look like this:

Class Interval	Frequency	Frequency Density
0 - under 10	10	10/10 = 1
10 - under 20	20	20/10 = 2
20 - under 40	30	30/20 = 1.5

Now, we can plot our histogram. (with class-intervals on x-axis and frequency-density on y-axis)

# given data
data = [
    8, 6, 0, 4, 5, 3, 2, 4, 3, 5,
    10, 10, 17, 16, 13, 12, 18, 16, 10, 14, 18, 14, 14, 15, 15, 11, 16, 17, 10, 13,
    32, 39, 39, 30, 30, 23, 27, 37, 25, 23, 34, 38, 26, 28, 23, 39, 28,
    38, 20, 39, 20, 31, 29, 37, 38, 26, 20, 20, 21, 37
]

# bins
bins = np.array([0, 10, 20, 40])

# class-widths
class_widths = bins[1:] - bins[:-1]

# frequency
frequency = np.histogram(data, bins=bins)[0]

# frequency-density
freq_dens = frequency / class_widths

# create new plot
fig, ax = create_axis(
    xticks=np.linspace(0, 40, 5), yticks=np.linspace(0, 3, 4), xlim=(0,41), ylim=(0,3),
    ylabel="Frequency Density"
)

# plot bars
ax.fill_between(bins.repeat(2)[1:-1], freq_dens.repeat(2),
                fc="#121212", ec="#F2F2F2", hatch=1*'/', lw=1, zorder=1)

# plot lines
for i in range(0, len(freq_dens) - 1):
    ax.plot(
        [bins[i + 1], bins[i + 1]], [0, freq_dens[i]], color="#F2F2F2", zorder=2, lw=1
    )


plt.show()

Here, area of the bar is equal to the frequency of the given class-interval.
- Area of bar1 = 10 x 1 = 10
- Area of bar2 = 10 x 2 = 20
- Area of bar3 = 20 x 1.5 = 30
So, this is how we construct a histogram with unequal class widths.

Conclusion

If the class intervals used along the horizontal axis are equal, then the height of the bars represents the frequency of values in a given class interval.
If the class intervals are unequal, then the areas of the bars are used for relative comparisons of class frequencies.

Questionnaire

Ques 01: Construct a histogram for the following data:

Class Interval	Frequency
30 - under 32	5
32 - under 34	7
34 - under 36	15
36 - under 38	21
38 - under 40	34
40 - under 42	24
42 - under 44	17
44 - under 46	8

Ques 02: Construct a histogram for the following data:

Class Interval	Frequency
0 - under 10	5
10 - under 20	7
20 - under 25	15
25 - under 30	21
30 - under 40	34
40 - under 60	24
60 - under 90	17
90 - under 100	8

1. Notes are compiled from TLMaths and Business Statistics by Ken Black ↩

2. If you face any problem or have any feedback/suggestions feel free to comment.↩