Returns a dummy matrix given an array of categorical variables.
Parameters : | data : array
col : ‘string’, int, or None
dictnames : bool, optional
drop : bool
|
---|---|
Returns : | dummy_matrix, [dictnames, optional] :
|
Notes
This returns a dummy variable for EVERY distinct variable. If a a recarray is provided, the names for the new variable prepend an underscore, so that attribute lookup is preserved. There is currently no name checking.
Examples
>>> import numpy as np
>>> import scikits.statsmodels as sm
Univariate examples
>>> import string
>>> string_var = [string.lowercase[0:5], string.lowercase[5:10], string.lowercase[10:15], string.lowercase[15:20], string.lowercase[20:25]]
>>> string_var *= 5
>>> string_var = np.asarray(sorted(string_var))
>>> design = sm.tools.categorical(string_var, drop=True)
Or for a numerical categorical variable
>>> instr = np.floor(np.arange(10,60, step=2)/10)
>>> design = sm.tools.categorical(instr, drop=True)
With a structured array
>>> num = np.random.randn(25,2)
>>> struct_ar = np.zeros((25,1), dtype=[('var1', 'f4'),('var2', 'f4'), ('instrument','f4'),('str_instr','a5')])
>>> struct_ar['var1'] = num[:,0][:,None]
>>> struct_ar['var2'] = num[:,1][:,None]
>>> struct_ar['instrument'] = instr[:,None]
>>> struct_ar['str_instr'] = string_var[:,None]
>>> design = sm.tools.categorical(struct_ar, col='instrument', drop=True)
Or
>>> design2 = sm.tools.categorical(struct_ar, col='str_instr', drop=True)