DNA microarrays allow researchers to measure the
coexpression of thousands of genes, and identify changes across
experimental condition. Recently, many studies have shifted from
tabulating the effects of individual genes to the effects of
groups of genes that share biological features. We define a
general framework for gene category testing, and show that most
existing methods can be presented as a contrast of the
differential expression within a category to that of the
complementary set of genes on the array. Our framework includes
post hoc tests that look for overrepresentation of the category in
a list of significant associations, and methods that consider
quantitative measures of differential expression for all genes.
We divide existing gene category tests into two classes. Class 1
tests are most commonly used, and assume gene-specific measures of
differential expression are independent, despite overwhelming
evidence of positive correlation. We provide analytic results and
simulations based on real microarray data to demonstrate that
Class 1 tests are strongly anti-conservative. Class 2 tests use
array permutation to account for correlation in expression, and by
construction have proper Type I error control. We have previously
introduced a general framework for Class 2 procedures called
Significance Analysis of Function and Expression (SAFE). Both
classes of tests assume or induce a null where all genes have the
same degree of differential expression, which may not be
biologically reasonable.
We introduce a more sensible and general (Class 3) null hypothesis
which states that the profile of differential expression is the
same within the category as for the entire array. Under this
broader null, we show that Class 2 tests tend to be conservative.
We present a bootstrap approach to test for departures from the
Class 3 null, and use simulations and real microarray data to
demonstrate that it provides valid Type I error control and more
power than Class 2 tests. If time permits, we will discuss
extensions of our testing approaches to groups of genes with
shared transcription factor motifs.