One R Tip A Day: Replacing 0 with NA - an evergreen from the list

lunedì 15 giugno 2009

Replacing 0 with NA - an evergreen from the list

This thread from the R-help list describe an evergreen tip that, at least once, is proved useful in R practice.

10 commenti:

Anonimo15 giugno 2009 alle ore 14:38
Or, even easier:

data[which(data==0)] = NA
RispondiElimina
Risposte
Paolo15 giugno 2009 alle ore 14:55
It's better to use data[data==0] <- NA.
Example:
set.seed(123)
data <- matrix(rnorm(100), ncol = 10)
data[sample(100, 20)] <- 0
data <- data.frame(data)
##
data[which(data==0)] = NA
Error in `[<-.data.frame`(`*tmp*`, which(data == 0), value = NA) :
new columns would leave holes after existing columns
## the code below does work
data[data==0] <- NA
RispondiElimina
Risposte
Gorjanc Gregor16 giugno 2009 alle ore 22:55
Package gdata has a set of functions for working with missing values in general. See the vignette.
RispondiElimina
Risposte
Paolo17 giugno 2009 alle ore 07:51
Thanks Gregor!
RispondiElimina
Risposte
Anonimo18 giugno 2009 alle ore 20:00
Hey,

Slightly off topic but do you klnow how I can subscribe to r-help via reader?

T
RispondiElimina
Risposte
Anonimo18 giugno 2009 alle ore 20:00
Google reader that is. Thanks!
RispondiElimina
Risposte
rtwillia23 settembre 2011 alle ore 15:27
Hi Paolo,

I have a question that I can't seem to find a good answer to. I frequently have missing data that need to be replaced by values of another variable in the data set. For example:

d<-data.frame(x=c(1,2,3,4,5),y=c(NA,NA,6,7,8))

Where y=NA, I want it to assume the value of x[i]. The only thing that I've found that works is a loop, but it's enormously sluggish on large data frames. This is what I've used in the past:

for(i in 1:nrow(d)){
d[[i,2]][is.na(d[[i,2]])]=d[[i,1]]
}

Do you know of a more efficient solution to this?

Thanks!
RispondiElimina
Risposte
Paolo23 settembre 2011 alle ore 16:03
This seems a good question to post on Stackoverflow
in the meantime:

d <- data.frame(x=c(1:10000), y=c(rep(NA,8000),1:2000))
system.time( for(i in 1:nrow(d)) d[[i,2]][is.na(d[[i,2]])]=d[[i,1]] )
# user system elapsed
# 2.506 0.023 2.529
d2 <- data.frame(x=c(1:10000), y=c(rep(NA,8000),1:2000))
system.time( d2 <- ifelse(is.na(d2$y),d2$x, d2$y) )
# user system elapsed
# 0.001 0.000 0.001
identical(d$y,d2)

HIH
RispondiElimina
Risposte
Anonimo6 maggio 2014 alle ore 22:02
You rock. A simple question with a simple answer when you see it, but a hard answer to fina.
RispondiElimina
Risposte