# Histogram

A histogram is a way to visually represent, quantitative data. It is useful in describing data that has considerable variability. The reason for this is that it takes the range of the dataset, dividing it into even groups, called bins, and then counting the number of numbers that fall in each one of those bins, and then displaying that amount with bars. To look at how to create a histogram, take the following dataset. For this example, we are going to use a bin width of 5. With our minimum value being 1 our bins would be 1 – 6, 6.1 – 11, 11.1 – 16, 16.1 – 21, 21.1 – 26, 26.1 – 31, and 31.1 – 36.

Now that we have the width of all the bins, we can go through the dataset and count the number of values fall in each of those bins. The first term in the dataset is an 18, which falls in the 16.1 – 21 bin. The second number is 16 which falls in the 11.1 – 16 bin. The third number is 11, which falls in the 6.1 – 11 bin. Continuing on with this same process, until all the data has been counted, gives us the following table.

Bin Width Number of Values in each Bin
1 – 6 9
6.1 – 11 14
11.1 – 16 8
16.1 – 21 5
21.1 – 26 3
26.1 – 31 0
31.1 – 36 1

The above table creates the below histogram. Determining the bin width is the most critical aspect of creating a histogram. If the bin width is too small, you get quite a bit of information about the data, but because of the inherent variability in quantitative data, the graph can display quite a bit of noise, which can obscure what the data is trying to say. For example, the next histogram is showing the same data, but it has a bin width of 1. The above graph with a bin width of one shows how much variability there is with this dataset. There is so much variability in fact that there is little else we can tell about this data besides its high variability.

On the other hand, if the bin width is too large, such as in the next histogram which has a bin width of 10, then lose a lot of details, which also makes it difficult to draw conclusions from the graph. 