This sandbox contains code that is for various resons not ready to be included in statsmodels proper. It contains modules from the old stats.models code that have not been tested, verified and updated to the new statsmodels structure: cox survival model, mixed effects model with repeated measures, generalized additive model and the formula framework. The sandbox also contains code that is currently being worked on until it fits the pattern of statsmodels or is sufficiently tested.
All sandbox modules have to be explicitly imported to indicate that they are not yet part of the core of statsmodels. The quality and testing of the sandbox code varies widely.
This is sandbox code
There are some examples in the sandbox.examples folder. Additional examples are directly included in the modules and in subfolders of the sandbox.
In this part we develop models and functions that will be useful for time series analysis, initially focused on the ARMA model and functions to simulate arma processes, and basic statistical properties such as autocorrelation, periodogram both estimated from data and the theoretical statistic given the lag polynomials of the ARMA process, and tools to work with AR and MA lag polynomials.
Some of the functions are currently written mainly to discover a way to use existing functions in scipy for time series analysis. Related functions are available in matplotlib, nitime, and scikits.talkbox. Those functions are designed more for the use in signal processing where longer time series are available and work more in the frequency domain.
tsa.acf(x[, unbiased]) | autocorrelation function for 1d |
tsa.acovf(x[, unbiased, demean]) | autocovariance for 1D |
tsa.pacf_ols(x[, maxlag]) | Partial autocorrelation estimated with non-recursive OLS |
tsa.pacf_yw(x[, maxlag, method]) | Partial autocorrelation estimated with non-recursive yule_walker |
tsa.ccf(x, y[, unbiased]) | cross-correlation function for 1d |
tsa.ccovf(x, y[, unbiased, demean]) | crosscovariance for 1D |
tsa.ARIMA() | currently ARMA only, no differencing used - no I |
tsa.arma_acf(ar, ma[, nobs]) | theoretical autocovariance function of ARMA process |
tsa.arma_acovf(ar, ma[, nobs]) | theoretical autocovariance function of ARMA process |
tsa.arma_generate_sample(ar, ma, nsample[, ...]) | generate an random sample of an ARMA process |
tsa.arma_impulse_response(ar, ma[, nobs]) | get the impulse response function (MA representation) for ARMA process |
tsa.movmean(x[, windowsize, lag]) | moving window mean |
tsa.movmoment(x, k[, windowsize, lag]) | non-central moment |
tsa.movorder(x[, order, windsize, lag]) | moving order statistics |
tsa.movstat | using scipy signal and numpy correlate to calculate some time series |
tsa.movvar(x[, windowsize, lag]) | moving window variance |
The following two ANOVA functions are fully tested against the NIST test data for balanced one-way ANOVA. anova_oneway follows the same pattern as the oneway anova function in scipy.stats but with higher precision for badly scaled problems. anova_ols produces the same results as the one way anova however using the OLS model class. It also verifies against the NIST tests, with some problems in the worst scaled cases. It shows how to do simple ANOVA using statsmodels in three lines and is also best taken as a recipe.
anova_oneway(y, x[, seq]) | |
anova_ols(y, x) | |
wls_prediction_std(res[, exog, weights, alpha]) | calculate standard deviation and confidence interval for prediction |
The following are helper functions for working with dummy variables and generating ANOVA results with OLS. They are best considered as recipes since they were written with a specific use in mind. These function will eventually be rewritten or reorganized.
try_ols_anova.data2dummy(x[, returnall]) | convert array of categories to dummy variables |
try_ols_anova.data2groupcont(x1, x2) | create dummy continuous variable |
try_ols_anova.data2proddummy(x) | creates product dummy variables from 2 columns of 2d array |
try_ols_anova.dropname(ss, li) | drop names from a list of strings, |
try_ols_anova.form2design(ss, data) | convert string formula to data dictionary |
The following are helper functions for group statistics where groups are defined by a label array. The qualifying comments for the previous group apply also to this group of functions.
try_catdata.cat2dummy(y[, nonseq]) | |
try_catdata.convertlabels(ys[, indices]) | convert labels based on multiple variables or string labels to unique |
try_catdata.groupsstats_1d(y, x, labelsunique) | use ndimage to get fast mean and variance |
try_catdata.groupsstats_dummy(y, x[, nonseq]) | |
try_catdata.groupstatsbin(factors, values) | uses np.bincount, assumes factors/labels are integers |
try_catdata.labelmeanfilter(y, x) | |
try_catdata.labelmeanfilter_nd(y, x) | |
try_catdata.labelmeanfilter_str(ys, x) |
Additional to these functions, sandbox regression still contains several examples, that are illustrative of the use of the regression models of statsmodels.
The following are for fitting systems of equations models. Though the returned parameters have been verified as accurate, this code is still very experimental, and the usage of the models will very likely change significantly before they are added to the main codebase.
SUR(sys[, sigma, dfk]) | Seemingly Unrelated Regression |
Sem2SLS(sys[, indep_endog, instruments]) | Two-Stage Least Squares for Simultaneous equations |
lagmat(x, maxlag[, trim]) | create 2d array of lags |
lagmat2ds(x, maxlag0[, maxlagex, dropex, trim]) | generate lagmatrix for 2d array, columns arranged by variables |
grangercausalitytests(x, maxlag) | four tests for granger causality of 2 timeseries |
pca(data[, keepdim, normalize, demean]) | principal components with eigenvector decomposition |
pcasvd(data[, keepdim, demean]) | principal components with svd |
graphics.qqplot(data[, dist, binom_n]) | qqplot of the quantiles of x versus the ppf of a distribution. |
descstats.sign_test(samp[, mu0]) | Signs test with mu0=0 by default (though |
descstats.descstats(data[, cols, axis]) | Prints descriptive statistics for one or multiple variables. |
None of these are fully working. The formula framework is used by cox and mixed.
Mixed Effects Model with Repeated Measures using an EM Algorithm
scikits.statsmodels.sandbox.mixed
Cox Proportional Hazards Model
scikits.statsmodels.sandbox.cox
Generalized Additive Models
scikits.statsmodels.sandbox.gam
Formula
scikits.statsmodels.sandbox.formula