5.LinearModels {limma}R Documentation

Linear Models for Microarrays

Description

This page gives an overview of the LIMMA functions available to fit linear models and to interpret the results.

The core of this package is the fitting of gene-wise linear models to microarray data. The basic idea is to estimate log-ratios between two or more target RNA samples simultaneously. See the LIMMA User's Guide for several case studies.

Forming the Design Matrix

The function modelMatrix is provided to assist with creation of an appropriate design matrix for two-color microarray experiments using a common reference. Design matrices for Affymetrix or single-color arrays can be easily created using the function model.matrix which is part of the R base package. For the direct two-color designs the design matrix often needs to be created by hand.

Fitting Models

There are four main functions in the package which fit linear models:

lmFit
This is a high level function which accepts objects and provides an entry point to the following three functions.
lm.series
Straightforward least squares fitting of a linear model for each gene.
rlm.series
An alternative to lm.series using robust regression as implemented by the rlm function in the MASS package.
gls.series
Generalized least squares taking into account correlations between duplicate spots (i.e., replicate spots on the same array) or between technical replicates. The function duplicateCorrelation is used to estimate the inter-duplicate correlation before using gls.series.

Each of these functions accepts essentially the same argument list and produces a fitted model object of the same form. The first function lmFit formally produces an object of class MArrayLM. The other three functions are lower level functions which produce similar output but in unclassed lists.

The main argument is the design matrix which specifies which target RNA samples were applied to each channel on each array. There is considerable freedom to choose the design matrix - there is always more than one choice which is correct provided it is interpreted correctly. The fitted model object consists of coefficients, standard errors and residual standard errors for each gene.

All the functions which fit linear models use unwrapdups which provides an unified method for handling duplicate spots.

All the above linear modeling functions accept two-color data in terms of log-ratios. See 6.SingleChannel for the modeling of two-color data in terms of the individual log-intensities.

Making Comparisons of Interest

Once a linear model has been fit using an appropriate design matrix, the command makeContrasts may be used to form a contrast matrix to make comparisons of interest. The fit and the contrast matrix are used by contrasts.fit to compute fold changes and t-statistics for the contrasts of interest. This is a way to compute all possible pairwise comparisons between treatments for example in an experiment which compares many treatments to a common reference.

Assessing Differential Expression

After fitting a linear model, the standard errors are moderated using a simple empirical Bayes model using ebayes or eBayes. A moderated t-statistic and a log-odds of differential expression is computed for each contrast for each gene.

ebayes and eBayes use internal functions fitFDist, tmixture.matrix and tmixture.vector.

The function zscoreT is sometimes used for computing z-score equivalents for t-statistics so as to place t-statistics with different degrees of freedom on the same scale. zscoreGamma is used the same way with standard deviations instead of t-statistics. These functions are for research purposes rather than for routine use.

Summarizing Model Fits

After the above steps the results may be displayed or further processed using:

toptable or topTable
Presents a list of the genes most likely to be differentially expressed for a given contrast.
classifyTestsF
Uses nested F-tests to classify the genes as up, down or even over the contrasts in the linear model with special attention to genes which are significant in more than one contrast. classifyTestsT and classifyTestsP are simpler methods using cutoffs for the t-statistics or p-values individually.
FStat
Computes an overall moderated F-statistic to test whether all the contrasts are equal to zero.
heatdiagram or heatDiagram
Allows visual comparison of the results across many different conditions in the linear model. Not the same as heatdiagrams produced by other packages! This function accepts a TestResults matrix produced by classifyTests.
vennCounts
Accepts output from classifyTests and counts the number of genes in each classification.
vennDiagram
Accepts output from classifyTests or vennCounts and produces a Venn diagram plot.
write.fit
Writes an MarrayLM object to a file.

When evaluating test procedures with simulated or known results, the utility function auROC can be used to compute the area under the Receiver Operating Curve for the test results for a given probe.

Author(s)

Gordon Smyth

References

Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3, No. 1, Article 3. http://www.bepress.com/sagmb/vol3/iss1/art3

Smyth, G. K., Michaud, J., and Scott, H. (2003). The use of within-array duplicate spots for assessing differential expression in microarray experiments. http://www.statsci.org/smyth/pubs/dupcor.pdf


[Package limma version 1.6.7 Index]