MCMCmixfactanal {MCMCpack} | R Documentation |
This function generates a sample from the posterior distribution of a mixed data (both continuous and ordinal) factor analysis model. Normal priors are assumed on the factor loadings and factor scores, improper uniform priors are assumed on the cutpoints, and inverse gamma priors are assumed for the error variances (uniquenesses). The user supplies data and parameters for the prior distributions, and a sample from the posterior distribution is returned as an mcmc object, which can be subsequently analyzed with functions provided in the coda package.
MCMCmixfactanal(x, factors, lambda.constraints=list(), data=parent.frame(), burnin = 1000, mcmc = 20000, thin=1, tune=NA, verbose = 0, seed = NA, lambda.start = NA, psi.start=NA, l0=0, L0=0, a0=0.001, b0=0.001, store.lambda=TRUE, store.scores=FALSE, std.mean=TRUE, std.var=TRUE, ... )
x |
A one-sided formula containing the
manifest variables. Ordinal (including dichotomous) variables must
be coded as ordered factors. Each level of these ordered factors must
be present in the data passed to the function. NOTE: data input is different in
|
factors |
The number of factors to be fitted. |
lambda.constraints |
List of lists specifying possible equality
or simple inequality constraints on the factor loadings. A typical
entry in the list has one of three forms: |
data |
A data frame. |
burnin |
The number of burn-in iterations for the sampler. |
mcmc |
The number of iterations for the sampler. |
thin |
The thinning interval used in the simulation. The number of iterations must be divisible by this value. |
tune |
The tuning parameter for the Metropolis-Hastings
sampling. Can be either a scalar or a k-vector (where
k is the number of manifest variables). |
verbose |
A switch which determines whether or not the progress of
the sampler is printed to the screen. If |
seed |
The seed for the random number generator. If NA, the Mersenne
Twister generator is used with default seed 12345; if an integer is
passed it is used to seed the Mersenne twister. The user can also
pass a list of length two to use the L'Ecuyer random number generator,
which is suitable for parallel computation. The first element of the
list is the L'Ecuyer seed, which is a vector of length six or NA (if NA
a default seed of |
lambda.start |
Starting values for the factor loading matrix
Lambda. If |
psi.start |
Starting values for the error variance (uniqueness)
matrix. If |
l0 |
The means of the independent Normal prior on the factor
loadings. Can be either a scalar or a matrix with the same
dimensions as |
L0 |
The precisions (inverse variances) of the independent Normal
prior on the factor loadings. Can be either a scalar or a matrix with
the same dimensions as |
a0 |
Controls the shape of the inverse Gamma prior on the
uniqueness. The actual shape parameter is set to |
b0 |
Controls the scale of the inverse Gamma prior on the
uniquenesses. The actual scale parameter is set to |
store.lambda |
A switch that determines whether or not to store the factor loadings for posterior analysis. By default, the factor loadings are all stored. |
store.scores |
A switch that determines whether or not to store the factor scores for posterior analysis. NOTE: This takes an enormous amount of memory, so should only be used if the chain is thinned heavily, or for applications with a small number of observations. By default, the factor scores are not stored. |
std.mean |
If |
std.var |
If |
... |
further arguments to be passed |
The model takes the following form:
Let 1=1,...,n index observations and j=1,...,K index response variables within an observation. An observed variable x_ij can be either ordinal with a total of C_j categories or continuous. The distribution of X is governed by a N by K matrix of latent variables Xstar and a series of cutpoints gamma. Xstar is assumed to be generated according to:
xstar_i = Lambda phi_i + epsilon_i
epsilon_i ~ N(0, Psi)
where xstar_i is the k-vector of latent variables specific to observation i, Lambda is the k by d matrix of factor loadings, and phi_i is the d-vector of latent factor scores. It is assumed that the first element of phi_i is equal to 1 for all i.
If the jth variable is ordinal, the probability that it takes the value c in observation i is:
pi_ijc = pnorm(gamma_jc - Lambda'_j phi_i) - pnorm(gamma_j(c-1) - Lambda'_j phi_i)
If the jth variable is continuous, it is assumed that xstar_{ij} = x_{ij} for all i.
The implementation used here assumes independent conjugate priors for each element of Lambda and each phi_i. More specifically we assume:
Lambda_ij ~ N(l0_ij, L0_ij^-1), i=1,...,k, j=1,...,d
phi_i(2:d) ~ N(0, I), i=1,...,n
MCMCmixfactanal
simulates from the posterior distribution using
a Metropolis-Hastings within Gibbs sampling algorithm. The algorithm
employed is based on work by Cowles (1996). Note that
the first element of phi_i is a 1. As a result, the
first column of Lambda can be interpretated as negative
item difficulty parameters. Further, the first
element gamma_1 is normalized to zero, and thus not
returned in the mcmc object.
The simulation proper is done in compiled C++ code to maximize
efficiency. Please consult the coda documentation for a comprehensive
list of functions that can be used to analyze the posterior sample.
As is the case with all measurement models, make sure that you have plenty of free memory, especially when storing the scores.
An mcmc object that contains the posterior sample. This object can be summarized by functions provided by the coda package.
Kevin M. Quinn. 2004. “Bayesian Factor Analysis for Mixed Ordinal and Continuous Responses.” Political Analysis. 12: 338-353.
Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park. 2011. “MCMCpack: Markov Chain Monte Carlo in R.”, Journal of Statistical Software. 42(9): 1-21. http://www.jstatsoft.org/v42/i09/.
M. K. Cowles. 1996. “Accelerating Monte Carlo Markov Chain Convergence for Cumulative-link Generalized Linear Models." Statistics and Computing. 6: 101-110.
Valen E. Johnson and James H. Albert. 1999. “Ordinal Data Modeling." Springer: New York.
Daniel Pemstein, Kevin M. Quinn, and Andrew D. Martin. 2007. Scythe Statistical Library 1.0. http://scythe.wustl.edu.
Martyn Plummer, Nicky Best, Kate Cowles, and Karen Vines. 2002. Output Analysis and Diagnostics for MCMC (CODA). http://www-fis.iarc.fr/coda/.
plot.mcmc
, summary.mcmc
,
factanal
, MCMCfactanal
,
MCMCordfactanal
,
MCMCirt1d
, MCMCirtKd
## Not run: data(PErisk) post <- MCMCmixfactanal(~courts+barb2+prsexp2+prscorr2+gdpw2, factors=1, data=PErisk, lambda.constraints = list(courts=list(2,"-")), burnin=5000, mcmc=1000000, thin=50, verbose=500, L0=.25, store.lambda=TRUE, store.scores=TRUE, tune=1.2) plot(post) summary(post) library(MASS) data(Cars93) attach(Cars93) new.cars <- data.frame(Price, MPG.city, MPG.highway, Cylinders, EngineSize, Horsepower, RPM, Length, Wheelbase, Width, Weight, Origin) rownames(new.cars) <- paste(Manufacturer, Model) detach(Cars93) # drop obs 57 (Mazda RX 7) b/c it has a rotary engine new.cars <- new.cars[-57,] # drop 3 cylinder cars new.cars <- new.cars[new.cars$Cylinders!=3,] # drop 5 cylinder cars new.cars <- new.cars[new.cars$Cylinders!=5,] new.cars$log.Price <- log(new.cars$Price) new.cars$log.MPG.city <- log(new.cars$MPG.city) new.cars$log.MPG.highway <- log(new.cars$MPG.highway) new.cars$log.EngineSize <- log(new.cars$EngineSize) new.cars$log.Horsepower <- log(new.cars$Horsepower) new.cars$Cylinders <- ordered(new.cars$Cylinders) new.cars$Origin <- ordered(new.cars$Origin) post <- MCMCmixfactanal(~log.Price+log.MPG.city+ log.MPG.highway+Cylinders+log.EngineSize+ log.Horsepower+RPM+Length+ Wheelbase+Width+Weight+Origin, data=new.cars, lambda.constraints=list(log.Horsepower=list(2,"+"), log.Horsepower=c(3,0), weight=list(3,"+")), factors=2, burnin=5000, mcmc=500000, thin=100, verbose=500, L0=.25, tune=3.0) plot(post) summary(post) ## End(Not run)