One R Tip A Day: How to Superimpose Histograms

giovedì 26 aprile 2007

How to Superimpose Histograms

Function inspired by the code of Martin Maechler found on the R-List at http://tolstoy.newcastle.edu.au/R/help/06/06/30059.html

superhist2pdf <- function(x, filename = "super_histograms.pdf",
dev = "pdf", title = "Superimposed Histograms", nbreaks ="Sturges") {
junk = NULL
grouping = NULL
for(i in 1:length(x)) {
junk = c(junk,x[[i]])
grouping <- c(grouping, rep(i,length(x[[i]]))) }
grouping <- factor(grouping)
n.gr <- length(table(grouping))
xr <- range(junk)
histL <- tapply(junk, grouping, hist, breaks=nbreaks, plot = FALSE)
maxC <- max(sapply(lapply(histL, "[[", "counts"), max))
if(dev == "pdf") { pdf(filename, version = "1.4") } else{}
if((TC <- transparent.cols <- .Device %in% c("pdf", "png"))) {
cols <- hcl(h = seq(30, by=360 / n.gr, length = n.gr), l = 65, alpha = 0.5) }
else {
h.den <- c(10, 15, 20)
h.ang <- c(45, 15, -30) }
if(TC) {
plot(histL[[1]], xlim = xr, ylim= c(0, maxC), col = cols[1], xlab = "x", main = title) }
else { plot(histL[[1]], xlim = xr, ylim= c(0, maxC), density = h.den[1], angle = h.ang[1], xlab = "x") }
if(!transparent.cols) {
for(j in 2:n.gr) plot(histL[[j]], add = TRUE, density = h.den[j], angle = h.ang[j]) } else {
for(j in 2:n.gr) plot(histL[[j]], add = TRUE, col = cols[j]) }
invisible()
if( dev == "pdf") {
dev.off() }
}

# How to use the function:
d1 = rnorm(1:100)
d2 = rnorm(1:100) + 4
# the input object MUST be a list!
l1 = list(d1,d2)
superhist2pdf(l1, nbreaks="Sturges")

15 commenti:

Unknown31 ottobre 2007 alle ore 15:50
Nice, thanks :)
Just change the function name to superhist when calling it, as you created the function with that name. Or declare the function itself with the name superhist2pdf.
RispondiElimina
Risposte
Paolo16 novembre 2007 alle ore 13:21
Thanks for the correction!
RispondiElimina
Risposte
Unknown10 gennaio 2009 alle ore 21:04
How can I change the number of breaks in the histogram?
RispondiElimina
Risposte
Paolo12 gennaio 2009 alle ore 09:36
Jake, thanks for the point. I modified the function so now it has a nbreaks argument;
see ?hist to check how to use that argument.
By the way, you can add how many arguments are available for the hist funcion, appending the desired arguments after hist in the below piece of code:
histL <- tapply(junk, grouping, hist, breaks=nbreaks, plot = FALSE)
RispondiElimina
Risposte
Anonimo24 febbraio 2009 alle ore 18:01
Great code, thanks!

BTW, I cannot switch to the probability densities mode. If I add the argument freq=FALSE after breaks=nbreaks, I get the warning "In hist.default... argument 'freq' is not made use of" when executing the function superhist2pdf.

Any ideas?

Thanks again.
RispondiElimina
Risposte
Paolo27 febbraio 2009 alle ore 11:14
Dear Chiara,
SFTD, the problem seems to be explained in the following paragraph of the man page for hist (?hist):

freq: logical; if 'TRUE', the histogram graphic is a representation of frequencies, the 'counts' component of the result; if 'FALSE', probability densities, component 'density', are plotted (so that the histogram has a total area of one). Defaults to 'TRUE' _if and only if_ 'breaks' are equidistant
(and 'probability' is not specified).

The specific warning you pointed out seems to be raised if you try to use the plot=F and the freq argument (either freq=T or freq=F) together.
If the 'breaks' you specified are not equidistant (nbreaks argument) the histogram will plot probability densities.
RispondiElimina
Risposte
سید امیر مرعشی5 aprile 2009 alle ore 14:40
Hi,
very nice code. Still, I am a little confused. I am not an R expert. I did not really understand what changes should be done on the code to get the probability densities.
Thank you for your help,
Amir
RispondiElimina
Risposte
eressea3 maggio 2009 alle ore 21:59
alternative:

hist(x)
hist(y, add=TRUE)
RispondiElimina
Risposte
Paolo4 maggio 2009 alle ore 09:03
The problem with the default add argument is that it doesn't take account of the range of the data/distributions.
RispondiElimina
Risposte
Anonimo15 luglio 2010 alle ore 18:07
Hi,
This looks like great stuff but I am having problems. I keep playing around with it but I get two different errors depending on what I do. The errors are "null device (line break) 1" and "windows (line break) 2". Any ideas? I am on a windows machine if that helps.
Thanks in advance!
-Sean
RispondiElimina
Risposte
Paolo16 luglio 2010 alle ore 08:37
I checked the code on a Windows machine and it seems to work! null device (line break) 1 is not an error, it basically says that the image in pdf is ready and the devide pdf, used to render the image was closed. Check your working directory (getwd() and dir()) you should find a pdf named "super_histograms.pdf" with the produced image!
RispondiElimina
Risposte
David Springate15 marzo 2011 alle ore 10:04
Lovely function! Just what I was looking for!
One little thing: If you assign range(junk, na.rm = TRUE) to xr then it can deal with missing values.
Cheers,

David
RispondiElimina
Risposte
Paolo15 marzo 2011 alle ore 10:18
Thanks David! Very useful addition to this thread!
RispondiElimina
Risposte
Loz14 settembre 2011 alle ore 17:56
Thanks so much for this code!

I am being a bit dim, and can't work out how to convert it to greayscale (for a paper) - any thoughts?

Many thanks again!

Lauren
RispondiElimina
Risposte
Paolo15 settembre 2011 alle ore 08:42
The no brain answer would be to use some graphics editor to convert from RGB to Gray scale.
In Mac OS X you can use the preview application: save as > Quartz Filter > Gray Tone.
In R you can use the EBImage Bioconductor package to convert the color mode from Color to Grayscale in this way:
img= readImage("super_histograms.pdf")
colorMode(img) <- Grayscale
display(img)

You can find useful hints about creating overlapping histograms at the following links: http://stackoverflow.com/questions/3541713/how-to-plot-two-histograms-together-in-r http://stackoverflow.com/questions/5328452/overlaid-histograms-in-r-ggplot2-preferred

HIH
RispondiElimina
Risposte

Aggiungi commento