What is a Boxplot?
The box plot (also box-whisker plot) is a diagram used to graphically represent the distribution of a characteristic that is at least ordinally scaled. It thereby combines various robust measures of dispersion and location in one representation. A box plot is intended to give a quick impression of the range in which the data lie and how they are distributed over this range. Therefore, all values of the so-called five-point summary, i.e. the median, the two quartiles and the two extreme values, are shown.
Examine the following items to learn more about the location and spread of the sample data:
The Median is represented by the line in the box. The median is a common measure of the location of the data. Half of the observations are less than or equal to the value, the other half of the observations are greater than or equal to the value.
The interquartile range box interquartile range box represents the middle 50% of the data. It indicates the distance between the first and the third quartile (Q3-Q1).
The Whiskers start from both sides of the box. They represent the ranges for the bottom 25 % and the top 25 % of the data values excluding the outliers.
Outliers, i.e. data values that are far away from the other data values, can have a strong impact on your results. Often, the easiest way to identify outliers is in a boxplot, where they are marked with dots or circles under and over the whiskers.
The Notch displays the 95% confidence interval around the median.
Example with mtcars:
Boxplot of MPG by Car Cylinders
boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of Cylinders", ylab="Miles Per Gallon")
Notched and coloured Boxplot of MPG by Car Cylinders
boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", notch=TRUE, col=(c("steelblue","darkgreen", "darkred")), xlab="Number of Cylinders", ylab="Miles Per Gallon")