lunedì 5 gennaio 2009

Statistical Visualizations - Part 2

Other 2 plots inspired by this post.

>original
Europe Asia Americas Africa Oceania
1820-30 106487 36 11951 17 33333
1831-40 495681 53 33424 54 69911
1841-50 1597442 141 62469 55 53144
1851-60 2452577 41538 74720 210 29169
1861-70 2065141 64759 166607 312 18005
1871-80 2271925 124160 404044 358 11704
1881-90 4735484 69942 426967 857 13363
1891-00 3555352 74862 38972 350 18028
1901-10 8056040 323543 361888 7368 46547
1911-20 4321887 247236 1143671 8443 14574
1921-30 2463194 112059 1516716 6286 8954
1931-40 347566 16595 160037 1750 2483
1941-50 621147 37028 354804 7367 14693
1951-60 1325727 153249 996944 14092 25467
1961-70 1123492 427642 1716374 28954 25215
1971-80 800368 1588178 1982735 80779 41254
1981-90 761550 2738157 3615225 176893 46237
1991-00 1359737 2795672 4486806 354939 98263
2001-06 1073726 2265696 3037122 446792 185986


png("immigration_barplot_me.png", width = 1419, height = 736)
library(RColorBrewer) # take a look at http://www.personal.psu.edu/cab38/ColorBrewer/ColorBrewer_intro.html
# display.brewer.all()
FD.palette <- c("#984EA3","#377EB8","#4DAF4A","#FF7F00","#E41A1C")
options(scipen=10)
par(mar=c(6, 6, 3, 3), las=2)
data4bp <- t(original[,c(5,4,2,3,1)])
barplot( data4bp, beside=F,col=FD.palette, border=FD.palette, space=1, legend=F, ylab="Number of People", main="Migration to the United States by Source Region (1820 - 2006)", mgp=c(4.5,1,0) )
legend( "topleft", legend=rev(rownames(data4bp)), fill=rev(FD.palette) )
box()
dev.off()





I find this 'bubbleplot' visualization quite interesting; unfortunately the R code I was capable to produce is quite poor and unsatisfactory. Any improvement or suggestion is more than welcome!
Anyway, this is the code:

png("immigration_bubbleplot_me.png", width=1400, height=400)
par(mar=c(3, 6, 3, 2), col="grey85")
mag = 0.9
original.vec <- as.matrix(original)
dim(original.vec) <- NULL
symbols( rep(1:nrow(original),ncol(original)), rep(5:1, each=nrow(original)), circles = original.vec, inches=mag, ylim=c(1,6),fg="grey85", bg="grey20", ylab="", xlab="", xlim =range(1:nrow(original)), xaxt="n", yaxt="n", main="Immigration to the USA - 1821 to 2006", panel.first = grid())
axis(1, 1:nrow(original), labels=rownames(original), las=1, col="grey85")
axis(2, 1:ncol(original), labels=rev(colnames(original)), las=1, col="grey85")
dev.off()




You can find the first part of this 'series' with Yihui contributed code (Thanks again!) here.

17 commenti:

  1. For the bubble plot:

    1. I was warned as below using R 2.8.1:

    Warning message:
    In symbols(rep(1:19, dim(original)[[2]]), rep(5:1, each = dim(original)[[1]]), :
    "axes" is not a graphical parameter

    2. I think dim(original)[[2]] can be replaced as ncol() (and number of rows to be nrow()) to make the code more tidy.

    3. And another suggestion is to make use of the 'panel.first' argument to avoid redrawing the bubble plot if you want the grid lines, e.g.
    symbols(1:10, 1:10, circles = runif(10), panel.first = grid())

    this argument is actually from plot.default().

    RispondiElimina
  2. and... yet another suggestion on "tidying up" R code:

    > library(animation)
    # copy the code to clipboard now, and
    > tidy.source()
    png("immigration_bubbleplot_me.png", width = 1400,
    height = 400)
    par(mar = c(3, 6, 3, 2))
    mag = 0.9
    original.vec <- as.matrix(original)
    dim(original.vec) <- NULL
    symbols(rep(1:dim(original)[[1]], dim(original)[[2]]),
    rep(5:1, each = dim(original)[[1]]), circles = original.vec,
    inches = mag, fg = "grey85", bg = "grey20", axes = F, ylab = "",
    xlab = "", xlim = range(1:dim(original)[[1]]), ylim = c(1,
    6), main = "Immigration to the USA - 1821 to 2006")
    axis(1, 1:dim(original)[[1]], labels = rownames(original),
    las = 1, col = "grey85", tck = 1)
    axis(2, 1:dim(original)[[2]], labels = rev(colnames(original)),
    las = 1, col = "grey85", tck = 1)
    box(col = "grey85")
    symbols(rep(1:19, dim(original)[[2]]), rep(5:1, each = dim(original)[[1]]),
    circles = original.vec, main = "", inches = mag, fg = "grey85",
    bg = "grey20", axes = F, ylab = "", xlab = "", add = T, xlim = range(1:dim(original)[[1]]),
    ylim = c(1, 6))
    dev.off()
    # but a big disadvantage is you cannot keep the comments when using tidy.source()! :-(
    # any ideas on "reformatting R code"?

    RispondiElimina
  3. Oh... my spaces were "removed" as I could not use the HTML tag "pre"...

    RispondiElimina
  4. Yihui, thanks so much for your contribution!
    I really appreciate it!
    I fixed my code taking advantage of your suggestions.
    Regarding the "axes" warning:
    I noticed the warning and decided to pass over it because the result was what I expected...
    After your remark, I fixed it using xaxt="n", and yaxt="n".
    Thanks again!

    RispondiElimina
  5. Ah, the elusive good-quality balloon or bubbleplot. I have tried for ages to get this one to work (http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=60), but never get it work properly.

    RispondiElimina
  6. Could you fix Immigration BarPlot? "original" is not defined in the statement below.
    data4bp <- t(original[,c(5,4,2,3,1)])

    I came across this site tonight and absolutely love it! Keep up the wonderful work. I'll contribute when I have bright ideas.

    Thanks so much too for link to PennState's NSF funded ColorBrewer. Kudos to Mark Harrower & Cindy Brewer!

    Jim Burke

    RispondiElimina
  7. Dear Jim,
    Thanks for the compliments! (*^_^*)
    You're absolutely right about the original object. I included it in the first part of this 'series' but for an easier usage It's better to include it here too.
    Any contribution is more than welcome! :-)

    RispondiElimina
  8. Issues still with the barplot.
    inputting original as below

    original <- read.table("original.txt", header=TRUE, sep = "")

    When I run your example in 2.8.1, I get no output and a "null device 1" after the last line.

    My apologies as I am a bit new to R.

    Thanks
    Jim

    RispondiElimina
  9. Dear Jim,
    Don't worry, at the beginning R can seem a bit
    'idiosyncratic' ;-)

    Try this:

    Copy the below table and paste it in a editor(vim, emacs, nano, TextMate, not MS WORD):

    Europe Asia Americas Africa Oceania
    1820-30 106487 36 11951 17 33333
    1831-40 495681 53 33424 54 69911
    1841-50 1597442 141 62469 55 53144
    1851-60 2452577 41538 74720 210 29169
    1861-70 2065141 64759 166607 312 18005
    1871-80 2271925 124160 404044 358 11704
    1881-90 4735484 69942 426967 857 13363
    1891-00 3555352 74862 38972 350 18028
    1901-10 8056040 323543 361888 7368 46547
    1911-20 4321887 247236 1143671 8443 14574
    1921-30 2463194 112059 1516716 6286 8954
    1931-40 347566 16595 160037 1750 2483
    1941-50 621147 37028 354804 7367 14693
    1951-60 1325727 153249 996944 14092 25467
    1961-70 1123492 427642 1716374 28954 25215
    1971-80 800368 1588178 1982735 80779 41254
    1981-90 761550 2738157 3615225 176893 46237
    1991-00 1359737 2795672 4486806 354939 98263
    2001-06 1073726 2265696 3037122 446792 185986

    Now in R type the below code, it should work (row.names=1 should make the trick):

    original <- read.table("original.txt",row.names=1)
    png("immigration_barplot_me.png", width = 1419, height = 736)
    library(RColorBrewer) # take a look at http://www.personal.psu.edu/cab38/ColorBrewer/ColorBrewer_intro.html
    # display.brewer.all()
    FD.palette <- c("#984EA3","#377EB8","#4DAF4A","#FF7F00","#E41A1C")
    options(scipen=10)
    par(mar=c(6, 6, 3, 3), las=2)
    data4bp <- t(original[,c(5,4,2,3,1)])
    barplot( data4bp, beside=F,col=FD.palette, border=FD.palette, space=1, legend=F, ylab="Number of People", main="Migration to the United States by Source Region (1820 - 2006)", mgp=c(4.5,1,0) )
    legend( "topleft", legend=rev(rownames(data4bp)), fill=rev(FD.palette) )
    box()
    dev.off()

    RispondiElimina
  10. Hi Paolo,

    The file read might not be the issue. Both your code and my code produce identical R data frames on input.

    Again, the code returns nothing. No errors except for the following at the very end.
    > dev.off()
    null device
    1

    This rather intrigues me. It almost seems a simple statement is missing. But no, barplot does the work. Is barplot failing silently somehow?

    My environment is MS XP SP 3, R 2.81 and my R packages are up to date.

    Also my R successfully runs the examples found in R's help(barplot). And I appreciate that you put in the statement to display.brewer(all) which nicely shows what we can use.

    Thanks,
    Jim

    RispondiElimina
  11. Dear Jim,
    the code produces a .png image file in your working directory (help("png"), help("dev") for more information). If you wish to see the plot in a windows graphic device simply use the below code (It is the identical code without the png() ... dev.off() stuff ):

    library(RColorBrewer) # take a look at http://www.personal.psu.edu/cab38/ColorBrewer/ColorBrewer_intro.html
    # display.brewer.all()
    FD.palette <- c("#984EA3","#377EB8","#4DAF4A","#FF7F00","#E41A1C")
    options(scipen=10)
    par(mar=c(6, 6, 3, 3), las=2)
    data4bp <- t(original[,c(5,4,2,3,1)])
    barplot( data4bp, beside=F,col=FD.palette, border=FD.palette, space=1, legend=F, ylab="Number of People", main="Migration to the United States by Source Region (1820 - 2006)", mgp=c(4.5,1,0) )
    legend( "topleft", legend=rev(rownames(data4bp)), fill=rev(FD.palette) )
    box()

    RispondiElimina
  12. Hi Paolo,

    Fantastico, meraviglioso ora tutto funziona perfettamente! I simply did not see the name the "png" file was saved as. This R line to save a file is a "più utile" function because it saves us manual clicks to save the R image that is produced.

    Grazie, Jim

    RispondiElimina
  13. You're welcome!
    By the way, good use of italian! ;-)
    Bravo! :-)

    RispondiElimina
  14. Awesome site - added to my google reader :-)

    I think the immigration data will be best visualised with two (or three) different plots (latice like maybey). The top plot shows the absolute number of immigrants, the second shows number of immigrants as percentage of total inhabitants, and the third shows percentage of originating country for each year.

    Just my 2 cents :-)

    Will be reading

    RispondiElimina
  15. Dear ACH,
    Thanks for the point, it is quite agreeable.
    With these 2 consecutive posts I tried to recreate few visualizations I had found interesting and easy to reproduce using simple and short R code.
    By now, I'm quite unsatisfied of the bubble plot that I think can be better reproduced taking advantage of the ggplot2 package (see, among the links, the website of Hadley Wickham ): a task for a future post...
    For any other alternative/better visualization, any contribution is more than welcome, also as a regular poster if anybody is interested!

    RispondiElimina
  16. Hi there,
    Thank you for very useful information.
    I would like to ask for some help to make bubble-plot look better and more informative.

    1) How could put a "legend-like thing" where the size of bubble represents the percentage/number of data.
    2) In the figure, there is an extra space or line in the beginning or end of the column line, ie, there is an extra empty column infront of the column-name "1820-30" and similarly an extra column at the end of column-name "2001/06". How can that gap be manipulated ?

    Thank you for your help in advance !

    RispondiElimina
  17. This post is a bit outdated. In order to produce better bubble plot with all bells and whistles (legends, etc.)take a look at the following links referring to more updated information and better alternatives:

    http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/
    http://stackoverflow.com/questions/8131326/sp-r-package-and-missing-values
    http://www.matthewmaenner.com/blog/?p=150

    and if you want to use Google Visualization API from R take a look at the googleVis package:
    http://lamages.blogspot.it/2012/03/googlevis-0215-is-released-improved-geo.html

    RispondiElimina