It's better to use data[data==0] <- NA. Example: set.seed(123) data <- matrix(rnorm(100), ncol = 10) data[sample(100, 20)] <- 0 data <- data.frame(data) ## data[which(data==0)] = NA Error in `[<-.data.frame`(`*tmp*`, which(data == 0), value = NA) : new columns would leave holes after existing columns ## the code below does work data[data==0] <- NA
I have a question that I can't seem to find a good answer to. I frequently have missing data that need to be replaced by values of another variable in the data set. For example:
d<-data.frame(x=c(1,2,3,4,5),y=c(NA,NA,6,7,8))
Where y=NA, I want it to assume the value of x[i]. The only thing that I've found that works is a loop, but it's enormously sluggish on large data frames. This is what I've used in the past:
for(i in 1:nrow(d)){ d[[i,2]][is.na(d[[i,2]])]=d[[i,1]] }
Or, even easier:
RispondiEliminadata[which(data==0)] = NA
It's better to use data[data==0] <- NA.
RispondiEliminaExample:
set.seed(123)
data <- matrix(rnorm(100), ncol = 10)
data[sample(100, 20)] <- 0
data <- data.frame(data)
##
data[which(data==0)] = NA
Error in `[<-.data.frame`(`*tmp*`, which(data == 0), value = NA) :
new columns would leave holes after existing columns
## the code below does work
data[data==0] <- NA
Package gdata has a set of functions for working with missing values in general. See the vignette.
RispondiEliminaThanks Gregor!
RispondiEliminaHey,
RispondiEliminaSlightly off topic but do you klnow how I can subscribe to r-help via reader?
T
Google reader that is. Thanks!
RispondiEliminaHi Paolo,
RispondiEliminaI have a question that I can't seem to find a good answer to. I frequently have missing data that need to be replaced by values of another variable in the data set. For example:
d<-data.frame(x=c(1,2,3,4,5),y=c(NA,NA,6,7,8))
Where y=NA, I want it to assume the value of x[i]. The only thing that I've found that works is a loop, but it's enormously sluggish on large data frames. This is what I've used in the past:
for(i in 1:nrow(d)){
d[[i,2]][is.na(d[[i,2]])]=d[[i,1]]
}
Do you know of a more efficient solution to this?
Thanks!
have you tried na.locf(d)?
EliminaThis seems a good question to post on Stackoverflow
RispondiEliminain the meantime:
d <- data.frame(x=c(1:10000), y=c(rep(NA,8000),1:2000))
system.time( for(i in 1:nrow(d)) d[[i,2]][is.na(d[[i,2]])]=d[[i,1]] )
# user system elapsed
# 2.506 0.023 2.529
d2 <- data.frame(x=c(1:10000), y=c(rep(NA,8000),1:2000))
system.time( d2 <- ifelse(is.na(d2$y),d2$x, d2$y) )
# user system elapsed
# 0.001 0.000 0.001
identical(d$y,d2)
HIH
You rock. A simple question with a simple answer when you see it, but a hard answer to fina.
RispondiElimina