In many cases of statistical analysis, we are not sure whether our statistical model is correctly specified. For example when using ols, then linearity and homoscedasticity are assumed, some test statistics additionally assume that the errors are normally distributed or that we have a large sample. Since our results depend on these statistical assumptions, the results are only correct of our assumptions hold (at least approximately).
One solution to the problem of uncertainty about the correct specification is to use robust methods, for example robust regression or robust covariance (sandwich) estimators. The second approach is to test whether our sample is consistent with these assumptions.
The following briefly summarizes specification and diagnostics tests for linear regression.
Note: Not all statistical tests in the sandbox are fully tested, and the API will still change. Some of the tests are still on the wishlist.
For these test the null hypothesis is that all observations have the same error variance, i.e. errors are homoscedastic. The tests differ in which kind of heteroscedasticity is considered as alternative hypothesis. They also vary in the power of the test for different types of heteroscedasticity.
This group of test whether the regression residuals are not autocorrelated. They assume that observations are ordered by time.
Test whether all or some regression coefficient are constant over the entire data sample.
(Note: considerable cleaning still required)
conditionnum (scikits.statsmodels.stattools) – needs test vs Stata – cf Grene (3rd ed.) pp 57-8 numpy.linalg.cond (for more general condition numbers, but no behind the scenes help for design preparation)
robust regression results example from example_rlm.py
import scikits.statsmodels as sm ### Example for using Huber's T norm with the default ### median absolute deviation scaling data = sm.datasets.stackloss.Load() data.exog = sm.add_constant(data.exog) huber_t = sm.RLM(data.endog, data.exog, M=sm.robust.norms.HuberT()) hub_results = huber_t.fit() print hub_results.weightsAnd the weights give an idea of how much a particular observation is down-weighted according to the scaling asked for.
qqplot, scipy.stats.probplot
nothing yet ???