Visualizzazione post con etichetta package. Mostra tutti i post
Visualizzazione post con etichetta package. Mostra tutti i post

mercoledì 16 novembre 2011

Weather forecast and good development practices

Inspired by this tutorial, I thought that it would be nice to have the possibility to have access to weather forecast directly from the R command line, for example for a personalized start-up message such as the one below:
Weather summary for Trieste, Friuli-Venezia Giulia:
The weather in Trieste is clear. The temperature is currently 14°C (57°F). Humidity: 63%.
Fortunately, thanks to the always useful Duncan Temple Lang's XML package (see here for a tutorial about XML programming under R), it is straightforward to write few lines of R code to invoke the google weather api for the location of interest, retrieve the XML file, parse it using the XPath paradigm and get the required informations:

address="Trieste"
url = paste( "http://www.google.com/ig/api?weather=", URLencode(address), sep="" )
xml = xmlTreeParse(url, useInternalNodes=TRUE) # take a look at the xml output:
# Get the required informations:
condition=xpathSApply(xml,"//xml_api_reply/weather/current_conditions/condition",xmlGetAttr,"data")
temp_c=xpathSApply(xml,"//xml_api_reply/weather/current_conditions/temp_c",xmlGetAttr,"data")
humidity=xpathSApply(xml,"//xml_api_reply/weather/current_conditions/humidity",xmlGetAttr,"data")
cat( paste("The Weather in ", address, " is ", condition, ". The temperature is ", temp_c, "°C. Humidity is ", humidity, "%.") )

Times ago I came to the conclusion that the best way to organize my R code is to create packages even for basic tasks. I know that It seems too much effort for this trivial task (and it was in the past) but fortunately, thanks to the Hadley Wickham's devtools package development It has become a piece of cake process (sort of)!

Below I present the minimal workflow I used to create this simple package. For a proper introduction to package development using devtools take a look at this link.

First create the skeleton for the project using the package.skeleton() function:
package.skeleton("pkg")
Read './pkg/Read-and-delete-me' file, compile the DESCRIPTION fiels according to your needs and delete './pkg/Read-and-delete-me'.
Now the devtools magic:
library("devtools")
pkg <- as.package("pkg") # pkg is the directory containing the structure created using package.skeleton()
Create your functions and documentation following the roxygen literate programming paradigm: basically you write your functions together with its documentation using in the preamble tags such as @param, @example, etc. to indicate the different constituents of the functions and devtools automagically will create the functions' documentation (.Rd files).
Then you test your code, try your examples, verify that your package passes the check without errors and warnings, build it and, if you like, you can ftp it directly to CRAN (disclaimer: I didn't check this feature)!
load_all(pkg, reset=T) # to reload the package without having to restart R
document(pkg) # to be used together with roxygen2 to creating the corresponding Rd files
run_examples(pkg) # to check the examples for the different functions
devtools:::check(pkg) # to verified if your package raises errors or warnings
devtools:::build(pkg)
install(pkg) # install your package
# release()

Final consideration: the devtools package improved significantly my day-by-day workflow and I want to thank Hadley Wickham for this and all the other valuable packages he gifted the R community! 
P.S. If you like to install the RWeather package I created using devtools, you can do it by typing:
install.packages("RWeather", repos="http://R-Forge.R-project.org")
or download the source code from here.
P.S.2 I'd like to thank Kay Cichini for this post which explains how to set-up the syntax-highlighting for the R code on Blogger.

Update: Thanks to the useful info I got from this Python module, now RWeather can show weather information from Yahoo! Weather, Google Weather and NOAA APIs.
From now the stable version of the package can be installed directly from CRAN:
install.packages("RWeather")

mercoledì 27 luglio 2011

Word Cloud in R

A word cloud (or tag cloud) can be an handy tool when you need to highlight the most commonly cited words in a text using a quick visualization. Of course, you can use one of the several on-line services, such as wordle or tagxedo , very feature rich and with a nice GUI. Being an R enthusiast, I always wanted to produce this kind of images within R and now, thanks to the recently released Ian Fellows' wordcloud package, finally I can!
In order to test the package I retrieved the titles of the XKCD web comics included in my RXKCD package and produced a word cloud based on the titles' word frequencies calculated using the powerful tm package for text mining (I know, it is like killing a fly with a bazooka!).

library(RXKCD)
library(tm)
library(wordcloud)
library(RColorBrewer)
path <- system.file("xkcd", package = "RXKCD")
datafiles <- list.files(path)
xkcd.df <- read.csv(file.path(path, datafiles))
xkcd.corpus <- Corpus(DataframeSource(data.frame(xkcd.df[, 3])))
xkcd.corpus <- tm_map(xkcd.corpus, removePunctuation)
xkcd.corpus <- tm_map(xkcd.corpus, content_transformer(tolower))
xkcd.corpus <- tm_map(xkcd.corpus, function(x) removeWords(x, stopwords("english")))
tdm <- TermDocumentMatrix(xkcd.corpus)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
pal <- brewer.pal(9, "BuGn")
pal <- pal[-(1:2)]
png("wordcloud.png", width=1280,height=800)
wordcloud(d$word,d$freq, scale=c(8,.3),min.freq=2,max.words=100, random.order=T, rot.per=.15, colors=pal, vfont=c("sans serif","plain"))
dev.off()

As a second example,  inspired by this post from the eKonometrics blog, I created a word cloud from the description of  3177 available R packages listed at http://cran.r-project.org/web/packages.
require(XML)
require(tm)
require(wordcloud)
require(RColorBrewer)
u = "http://cran.r-project.org/web/packages/available_packages_by_date.html"
t = readHTMLTable(u)[[1]]
ap.corpus <- Corpus(DataframeSource(data.frame(as.character(t[,3]))))
ap.corpus <- tm_map(ap.corpus, removePunctuation)
ap.corpus <- tm_map(ap.corpus, content_transformer(tolower))
ap.corpus <- tm_map(ap.corpus, function(x) removeWords(x, stopwords("english")))
ap.corpus <- Corpus(VectorSource(ap.corpus))
ap.tdm <- TermDocumentMatrix(ap.corpus)
ap.m <- as.matrix(ap.tdm)
ap.v <- sort(rowSums(ap.m),decreasing=TRUE)
ap.d <- data.frame(word = names(ap.v),freq=ap.v)
table(ap.d$freq)
pal2 <- brewer.pal(8,"Dark2")
png("wordcloud_packages.png", width=1280,height=800)
wordcloud(ap.d$word,ap.d$freq, scale=c(8,.2),min.freq=3,
max.words=Inf, random.order=FALSE, rot.per=.15, colors=pal2)
dev.off()

As a third example, thanks to Jim's comment, I take advantage of Duncan Temple Lang's RNYTimes package to access user-generate content on the NY Times and produce a wordcloud of 'today' comments on articles.
Caveat: in order to use the RNYTimes package you need a API key from The New York Times which you can get by registering to the The New York Times Developer Network (free of charge) from here.
require(XML)
require(tm)
require(wordcloud)
require(RColorBrewer)
install.packages(packageName, repos = "http://www.omegahat.org/R", type = "source")
require(RNYTimes)
my.key <- "your API key here"
what= paste("by-date", format(Sys.time(), "%Y-%m-%d"),sep="/")
# what="recent"
recent.news <- community(what=what, key=my.key)
pagetree <- htmlTreeParse(recent.news, error=function(...){}, useInternalNodes = TRUE)
x <- xpathSApply(pagetree, "//*/body", xmlValue)
# do some clean up with regular expressions
x <- unlist(strsplit(x, "\n"))
x <- gsub("\t","",x)
x <- sub("^[[:space:]]*(.*?)[[:space:]]*$", "\\1", x, perl=TRUE)
x <- x[!(x %in% c("", "|"))]
ap.corpus <- Corpus(DataframeSource(data.frame(as.character(x))))
ap.corpus <- tm_map(ap.corpus, removePunctuation)
ap.corpus <- tm_map(ap.corpus, content_transformer(tolower))
ap.corpus <- tm_map(ap.corpus, function(x) removeWords(x, stopwords("english")))
ap.tdm <- TermDocumentMatrix(ap.corpus)
ap.m <- as.matrix(ap.tdm)
ap.v <- sort(rowSums(ap.m),decreasing=TRUE)
ap.d <- data.frame(word = names(ap.v),freq=ap.v)
table(ap.d$freq)
pal2 <- brewer.pal(8,"Dark2")
png("wordcloud_NewYorkTimes_Community.png", width=1280,height=800)
wordcloud(ap.d$word,ap.d$freq, scale=c(8,.2),min.freq=2,
max.words=Inf, random.order=FALSE, rot.per=.15, colors=pal2)
dev.off()


giovedì 14 luglio 2011

R meets XKCD

Being a big fan of XKCD and, of course, of the R programming language, I thought that a package which allows to display my favorite strips  would something (useless) but cool!
So, mimicking the approach (and the code) of the fortunes package (thanks Achim Zeileis!), I created a simple package (names RXKCD) which allows the user to displays his favorite XKCD strip by selecting the specific number, randomly or simply displaying the current strip.
You can install the package using:
if (!require('RJSONIO')) install.packages('RJSONIO', repos = 'http://cran.r-project.org')
if (!require('png')) install.packages('png', repos = 'http://cran.r-project.org')
if (!require('ReadImages')) install.packages('ReadImages', repos = 'http://cran.r-project.org')
install.packages("RXKCD", repos="http://R-Forge.R-project.org")
And you can use it by typing:
library(RXKCD)
searchXKCD("someone is wrong")
getXKCD(386)
Below the result (xkcd license):


Update: The updated version of the package , which is available from CRAN (just type install.packages("RXKCD") ), allows the user to save the xkcd metadata database in a local directory (.Rconfig) and update it in order to have access to the latest XKCD info: see ?saveConfig and ?updateConfig.

venerdì 24 aprile 2009

Colors in the R terminal

Today, I'd like to suggest a new R package that you can download from here.
Still in its early development, the xterm256 package allows to print text in the R terminal using different colours. You can find more information here.
The picture below depicts a basic example of its use.

giovedì 9 agosto 2007

R package installation and administration

A short list of basic but useful commands for managing
the packages in R:

# install a package
install.packages("ROCR")
# visualize package version
package_version("pamr")
# update a package
update.packages("Cairo")
# remove a package
remove.packages("RGtk2")