Variable-width (diagonally cut) histogram — dhist • PEcAn.visualization

When constructing a histogram, it is common to make all bars the same width. One could also choose to make them all have the same area. These two options have complementary strengths and weaknesses; the equal-width histogram oversmooths in regions of high density, and is poor at identifying sharp peaks; the equal-area histogram oversmooths in regions of low density, and so does not identify outliers. We describe a compromise approach which avoids both of these defects. We regard the histogram as an exploratory device, rather than as an estimate of a density.

dhist(
  x,
  a = 5 * iqr(x),
  nbins = grDevices::nclass.Sturges(x),
  rx = range(x, na.rm = TRUE),
  eps = 0.15,
  xlab = "x",
  plot = TRUE,
  lab.spikes = TRUE
)

Arguments

x: is a numeric vector (the data)
a: is the scaling factor, default is 5 * IQR
nbins: is the number of bins, default is assigned by the Stuges method
rx: is the range used for the left of the left-most bin to the right of the right-most bin
eps: used to set artificial bound on min width / max height of bins as described in Denby and Mallows (2009) on page 24
xlab: is label for the x axis
plot: = TRUE produces the plot, FALSE returns the heights, breaks and counts
lab.spikes: = TRUE labels the % of data in the spikes

Value

list with two elements, heights of length n and breaks of length n+1 indicating the heights and break points of the histogram bars.

References

Lorraine Denby, Colin Mallows. Journal of Computational and Graphical Statistics. March 1, 2009, 18(1): 21-31. doi:10.1198/jcgs.2009.0002.

Author

Lorraine Denby, Colin Mallows