## giovedì 11 dicembre 2008

### Tips from Jason

I want to thank Jason Vertrees for the following collection of useful tips!

(1) Use ~/.Rprofile for repeated environment initialization

(2) Ever have the problem of a large data frame only being displayed across 40% of your terminal window? Then, you can resize the R display to fit the size of your terminal window. Use the following "wideScreen" function:

```# define wideScreen wideScreen <- function() { options(width=as.integer(Sys.getenv("COLUMNS"))); } # # Test wideScreen # a <- rnorm(100) a wideScreen() # notice how the data fill the screen a ```

(3) Get familiar with colorspace. For example, if you need to color data points across a range, you can easily do:
``` ## ## lut.R -- small function that returns a cool pallete of nColors ## require(colorspace) lut <- function(nColors=20) { return(hex(HSV(seq(0, 360, length=nColors)[-nColors], 1, 1))); } # Now use lut. plot( rnorm(100), col=lut(100)[1:100] ) # Now use just a range; use colors near purple; pretty # much like gettins subsections of rainbow.colors() plot( rnorm(30), col=lut(100)[71:100] ) ```

(4) Given an N-dimensional data set, (m instances in N dimensions), find the K-nearest neighbors to a given row/instance/point:
``` ## ## neighbors -- find and return the K closest neighbors to "home" ## neighbors <- function( dat, home, k=10 ) { theHood <- apply( dat, 1, function(x) sqrt(sum((x-home)**2))) return(order(theHood)[1:k] ) } # Use it. Create a random 10x10 matrix and find which rows # in D are closest (Euclidean-wise) to row 1. d <- matrix( rnorm(100), nrow=10, ncol=10) neighbors(d, d[1,], k=3)```

(5) A _VERY_ useful tip is to show the users the vast difference in speed between using for, apply, sapply, mapply and tapply. A for loop is typically very slow, where the ?apply family is great. You can use the apply vs for-loop in the neighbors function above with a timer on a large set to show the difference.

(6) Another useful tip, also in neighbors is generating difference vectors and their lengths:
``` # the difference vector between two vectors is very easy, c <- a -b # now the vector length (how far apart in Euclidean space these two points are) sqrt(sum(c**2))```