One R Tip A Day: Dealing with missing values

domenica 8 marzo 2009

Dealing with missing values

Two new quick tips from 'almost regular' contributor Jason:

Handling missing values in R can be tricky. Let's say you have a table
with missing values you'd like to read from disk. Reading in the table
with,

read.table( fileName )

might fail. If your table is properly formatted, then R can determine
what's a missing value by using the "sep" option in read.table:

read.table( fileName, sep="\t" )

This tells R that all my columns will be separated by TABS regardless of
whether there's data there or not. So, make sure that your file on disk
really is fully TAB separated: if there is a missing data point you must
have a TAB to tell R that this datum is missing and to move to the next
field for processing.

Lastly, don't forget the "header=T" option if you have a header line in
your file.

Here's the 2nd tip:

Some algorithms in R don't support missing (NA) values. If you have a
data.frame with missing values and quickly want the ROWS with any
missing data to be removed then try:

myData[rowSums(is.na(myData))==0, ]

To find NA values in your data you have to use the "is.na" function.

6 commenti:

Anonimo9 marzo 2009 alle ore 00:06
Regarding the second tip: you can also use complete.cases function.
RispondiElimina
Risposte
JoFrhwld9 marzo 2009 alle ore 04:16
If there is only a specific column that has NAs, then you could also try

myData[!is.na(myData$Column),]
RispondiElimina
Risposte
Paolo9 marzo 2009 alle ore 09:27
Thanks to both of you for the useful comments!
RispondiElimina
Risposte
Etienne21 marzo 2009 alle ore 02:05
Using na.omit(myData) would remove every line conaining a NA
RispondiElimina
Risposte
Anonimo5 febbraio 2012 alle ore 12:31
Etienne, thanks a lot for this tip!
RispondiElimina
Risposte

Aggiungi commento