"A big computer, a complex algorithm and a long time does not equal science." -- Robert Gentleman
venerdì 31 dicembre 2010
lunedì 8 novembre 2010
A R wrapper for Google Prediction API
Since I got the chance to access to both Google Storage for Developers and Google Prediction API (more details here and here), I decided to create a simple wrapper (just 4 basic functions until now) to be capable to play with the Google Prediction API from R.
Here you can find the github repository for the project and below few lines of code reproducing an example you can find on the Google Prediction API website.
Download the source code from here.
Either source the functions contained in the R directory or install the package typing (from the command line in a Unix-like environment):
R CMD INSTALL predictionapirwrapper_1.0.tar.gz
# start R and type (code highlighting thanks to Revolution Analytics Pretty R syntax highlighter):
All comments, corrections, alternative code are more than welcome!
Update: a more complete and functional alternative can be found here.
Here you can find the github repository for the project and below few lines of code reproducing an example you can find on the Google Prediction API website.
Download the source code from here.
Either source the functions contained in the R directory or install the package typing (from the command line in a Unix-like environment):
R CMD INSTALL predictionapirwrapper_1.0.tar.gz
# start R and type (code highlighting thanks to Revolution Analytics Pretty R syntax highlighter):
library(predictionapirwrapper) ## The first stage of using the API is to acquire an authorization token. This can be done via this command: token <- GetAuthToken(email="user@gmail.com", passwd="mypassword") ## This command begins training on data that has been previously uploaded to Google Storage. GoogleTrain(auth_token=token$Auth, mybucket="data_languages", mydata="language_id.txt") ## Once training has started, this command checks the status of the training job and gets meta-information on the model (if available). GoogleTrainCheck(auth_token=token$Auth, mybucket="data_languages", mydata="language_id.txt") ## When training has finished, this command issues a request for a new prediction from the model. GooglePredict(auth_token=token$Auth, mybucket="data_languages", mydata="language_id.txt", myinput="La idioma mas fina")
All comments, corrections, alternative code are more than welcome!
Update: a more complete and functional alternative can be found here.
venerdì 15 ottobre 2010
R 2.12.0 is released!
mercoledì 21 luglio 2010
R Cheat Sheets and more
mercoledì 30 giugno 2010
giovedì 22 aprile 2010
R 2.11.0 is released!
venerdì 19 marzo 2010
Balloon plot using ggplot2
Following Tal Galili example and using part of his code, I want to plot the balloonplot you can see here using R and the excellent ggplot2 package by Hadley Wickham.
### I retrieve the data from the google document you can find here using Tal Galili code: ## I slightly modified Tal code to include popularity stats: supplement.popularity <- supplements.data[ss,7] supplements.df <- na.omit(data.frame(supplement.name, supplement.benefits, supplement.popularity, supplement.score)) ## remove rows containing NAs colnames(supplements.df) <- c("name", "benefits", "popularity", "score") ## For sake of simplicity I select only the cardio metacondition cardio <- (supplements.df[supplement.benefits=="cardio",])[, -2]
## For reproducibility I add the cardio data.frame so you can use it right away cardio <- read.table(tc <-textConnection( " name popularity score 2 'arginine' 1.080 3 10 'vitamin b3' 0.201 3 15 'omega 3' 4.000 3 22 'hawthorn' 0.442 4 27 'red yeast rice' 0.264 4 29 'vitamin d' 6.700 4 31 'omega 6' 2.000 4 35 'green tea' 26.100 5 37 'olive leaf' 0.224 5 41 'fish oil' 4.000 6 43 'red yeast rice' 0.264 6")); close(tc) cardio$name <- gsub(" ", "\n", cardio$name) #substitute ' ' with '\n' in the names
library(ggplot2) myTheme <- function(base_size = 10) { structure(list( panel.background = theme_rect(size = 1, colour = "lightgray"), panel.grid.major = theme_blank(), panel.grid.minor = theme_blank(), axis.line = theme_blank(), axis.text.x = theme_blank(), axis.ticks = theme_blank(), strip.background = theme_blank(), strip.text.y = theme_blank(), legend.background = theme_blank(), legend.key = theme_blank(), legend.key.size = unit(1.2, "lines"), legend.title = theme_text(size = 8, face = "bold", hjust = 0), legend.position = "right" ), class = "options") }
s <- ggplot(cardio, aes(name, score)) + xlab(NULL) + ylab(NULL) + myTheme() s <- s + geom_point( aes(size=popularity, colour=score, fill=score), legend=TRUE) + scale_y_continuous( breaks=as.numeric(levels(factor(cardio$score))), labels=c("Conflicting", "Promising", "Good", "Strong") ) + scale_area( breaks=c(min(cardio$popularity),mean(cardio$popularity),max(cardio$popularity)), to=c(4,60) ) + geom_text(aes(y=cardio$score, label=cardio$name, size=cardio$popularity/90), legend=FALSE) #pdf("cardio.pdf",height=8,width=12);s;dev.off() png("cardio.png",height=700,width=1000);s;dev.off()
domenica 7 marzo 2010
One R Tip A Day meets Tecnica Arcana
For italian speaking people only (sorry!).
Carlo il curatore dell'ottimo podcast tecnologico Tecnica Arcana mi ha intervistato sulla mia professione e su R. Qui potete scaricare l'intervista in formato mp3.
Carlo il curatore dell'ottimo podcast tecnologico Tecnica Arcana mi ha intervistato sulla mia professione e su R. Qui potete scaricare l'intervista in formato mp3.
giovedì 7 gennaio 2010
Scatter plot with 4 axes labels and grid
Ravi from this post (via Revolutions blog) wanted to check the code that produces the left panel of the Figure 3 from this article taken from the current issue of the R Journal. Below my attempt to reproduce the plot:
rv <- seq(1.3, 2.9, .1)
rv <- rv[-grep("1.6", rv)] # remove R version 1.6
pckg.num <- c(110,129,162,219,273,357,406,548,647,739,911,1000,1300,1427,1614,1952)
rv.dates <- c("2001-6-21", "2001-12-17","2002-06-12","2003-05-27",
"2003-11-16","2004-06-05","2004-10-12","2005-06-18","2005-12-16", "2006-05-31",
"2006-12-12","2007-04-12","2007-11-16","2008-03-18","2008-10-18","2009-09-17")
pckg.fit <- lm(pckg.num~rv)
png("CRAN_packages.png")
par(mar=c(7, 5, 5, 3), las=2)
plot(as.POSIXct(rv.dates), pckg.num, xlab="",ylab="",col="red", log="y", pch=19, axes=F)
axis.POSIXct(1, 1:16, rv.dates, format="%Y-%m-%d")
mtext("Date", side=1, line=5, las=1)
axis(2, at=c(100,200,300,400,500,600,800,100,1200,1500,2000))
mtext("Number of CRAN Packages", side=2, line=3, las=3)
axis.POSIXct(3, rv.dates, rv.dates, labels=as.character(rv))
mtext("R Version", side=3, line=3, las=1)
axis(4, pckg.num)
abline(v=as.POSIXct(rv.dates), col="lightgray", lty="dashed")
abline(h=pckg.num, col="lightgray", lty="dashed")
box()
abline(lm(log10(pckg.num)~as.POSIXct(rv.dates)), col="red")
dev.off()
martedì 5 gennaio 2010
Iscriviti a:
Post (Atom)