vsn {vsn} | R Documentation |
Robust estimation of variance-stabilizing and calibrating transformations for microarray data. This is the main function of this package; see also the vignette vsn.pdf.
vsn(intensities, lts.quantile = 0.5, verbose = TRUE, niter = 10, cvg.check = NULL, describe.preprocessing = TRUE, pstart, strata)
intensities |
An object that contains intensity values from
a microarray experiment. See
getIntensityMatrix for details.
The intensities are assumed to be the raw
scanner data, summarized over the spots by an image analysis program,
and possibly "background subtracted".
The intensities must not be logarithmically or otherwise transformed,
and not thresholded or "floored". NAs are not accepted.
See details. |
lts.quantile |
Numeric. The quantile that is used for the resistant least trimmed sum of squares regression. Allowed values are between 0.5 and 1. A value of 1 corresponds to ordinary least sum of squares regression. |
verbose |
Logical. If TRUE, some messages are printed. |
niter |
Integer. The number of iterations to be used in the least trimmed sum of squares regression. |
cvg.check |
List. If non-NULL, this allows finer control of the iterative least trimmed sum of squares regression. See details. |
pstart |
Array. If not missing, user can specify start values
for the iterative parameter estimation algorithm. See
vsnh for details. |
describe.preprocessing |
Logical. If TRUE, calibration and
transformation parameters, plus some other information are stored in
the preprocessing slot of the returned object. See details. |
strata |
Integer vector. Its length must be the same as nrow(intensities).
This parameter allows for the calibration and error model parameters
to be stratified within each array, e.g to take into account probe
sequence properties, print-tip or plate effects.
If strata is not specified, one pair of parameters is fitted
for every sample (i.e. for every column of intensities ). If
strata is specified, a pair of parameters is fitted for every
stratum within every sample. The strata are coded for by the different
integer values. The integer vector strata can be obtained
from a factor fac through as.integer(fac) , from
a character vector str through as.integer(factor(fac)) . |
Overview:
The function calibrates for sample-to-sample variations through
shifting and scaling, and transforms the intensities to a scale where
the variance is approximately independent of the mean intensity.
The variance stabilizing transformation is equivalent to the
natural logarithm in the high-intensity range, and to a
linear transformation in the low-intensity range. In an intermediate
range, the arsinh function interpolates smoothly between the
two. For details on the transformation, please see the help page for
vsnh
. The parameters are estimated through
a robust variant of maximum likelihood. This assumes that for
the majority of genes the expression levels are not much different
across the samples, i.e., that only a minority of genes (less than
a fraction 1-lts.quantile
) is differentially expressed.
Even if most genes on an array are differentially expressed, it may still
be possible to use the estimator: if a set of non-differentially expressed
genes is known, e.g. because they are external controls or reliable
'house-keeping genes', the transformation parameters can be fitted with
vsn
from the data of these genes, then the transformation can be
applied to all data with vsnh
.
Format: The format of the matrix of intensities is as follows:
for the two-color printed array technology, each row
corresponds to one spot, and the columns to the different arrays
and wave-lengths (usually red and green, but could be any number).
For example, if there are 10 arrays, the matrix would have 20 columns,
columns 1...10 containing the green intensities, and 11...20 the
red ones. In fact, the ordering of the columns does not matter to
vsn
, but it is your responsibility to keep track of it for
subsequent analyses.
For one-color arrays, each row corresponds to a probe, and each
column to an array.
Performance: This function is slow. That is due to the nested
iteration loops of the numerical optimization of the likelihood function
and the heuristic that identifies the non-outlying data points in the
least trimmed squares regression. For large arrays with many tens of
thousands of probes, you may want to consider random subsetting: that is,
only use a subset of the e.g. 10-20,000 rows of the data matrix
intensities
to fit the parameters, then apply the transformation
to all the data, using vsnh
. An example for this can be
seen in the function normalize.AffyBatch.vsn
, whose code
you can inspect by typing normalize.AffyBatch.vsn
on the R
command line.
Iteration control:
By default, if cvg.check
is NULL
, the function will run
the fixed number niter
of iterations in the least trimmed sum
of squares regression. More fine-grained control can be obtained by
passing a list with elements eps
and n
. If the maximum
change between transformed data values is smaller than eps
for
n
subsequent iterations, then the iteration terminates.
Estimated transformation parameters:
If describe.preprocessing
is TRUE
, the transformation
parameters are returned in the preprocessing
slot of the
description
slot of the resulting
exprSet
object, in the form
of a list
with three elements
vsnParams
: the parameter array (see vsnh
for details)
vsnParamsIter
: an array with dimensions
c(dim(vsnParams, niter))
that contains the parameter
trajectory during the iterative fit process (see also
vsnPlotPar
).
vsnTrimSelection
: a logical vector that for
each row of the intensities matrix reports whether it was below
(TRUE) or above (FALSE) the trimming threshold.
If intensities
has class exprSet
,
and its description
slot has class
MIAME
, then this list is appended to any
existing entries in the preprocessing
slot. Otherwise, the
description
object and its preprocessing
slot are created.
An object of class exprSet
.
Differences between the columns of the transformed intensities may be
interpreted as "generalized log-ratios". For the transformation parameters,
please see details.
Wolfgang Huber http://www.dkfz.de/abt0840/whuber
Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, Martin Vingron; Bioinformatics (2002) 18 Suppl.1 S96-S104.
Parameter estimation for the calibration and variance stabilization of microarray data, Wolfgang Huber, Anja von Heydebreck, Holger Sueltmann, Annemarie Poustka, and Martin Vingron; Statistical Applications in Genetics and Molecular Biology (2003) Vol. 2 No. 1, Article 3. http://www.bepress.com/sagmb/vol2/iss1/art3.
vsnh
, vsnPlotPar
,
exprSet-class
,
MIAME-class
,
normalize.AffyBatch.vsn
data(kidney) if(interactive()) { x11(width=9, height=4.5) par(mfrow=c(1,2)) } plot(log.na(exprs(kidney)), pch=".", main="log-log") vsnkid = vsn(kidney) ## transform and calibrate plot(exprs(vsnkid), pch=".", main="h-h") if (interactive()) { x11(width=9, height=4) par(mfrow=c(1,3)) } meanSdPlot(vsnkid) vsnPlotPar(vsnkid, "factors") vsnPlotPar(vsnkid, "offsets") ## this should always hold true params = preproc(description(vsnkid))$vsnParams stopifnot(all(vsnh(exprs(kidney), params) == exprs(vsnkid)))