Handling missing values

Four special functions are available for the handling of missing values. The boolean function missing() takes the name of a variable as its single argument; it returns a series with value 1 for each observation at which the given variable has a missing value, and value 0 otherwise (that is, if the given variable has a valid value at that observation). The function ok() is complementary to missing; it is just a shorthand for !missing (where ! is the boolean NOT operator).

For example, one can count the missing values for variable x using


      genr nmiss_x = sum(missing(x))

The function zeromiss(), which again takes a single series as its argument, returns a series where all zero values are set to the missing code. This should be used with caution — one does not want to confuse missing values and zeros — but it can be useful in some contexts. For example, one can determine the first valid observation for a variable x using


      genr time
      genr x0 = min(zeromiss(time * ok(x)))

The function misszero() does the opposite of zeromiss, that is, it converts all missing values to zero.

It may be worth commenting on the propagation of missing values within genr formulae. The general rule is that in arithmetical operations involving two variables, if either of the variables has a missing value at observation t then the resulting series will also have a missing value at t. The one exception to this rule is multiplication by zero: zero times a missing value produces zero (since this is mathematically valid regardless of the unknown value).