1. Kerr, M. K. Design considerations for efficient and effective microarray studies. Biometrics 59, 822–828 (2003).

    Article  PubMed  Google Scholar 

  2. Page, G. P., Edwards, J. W., Barnes, S., Weindruch, R. & Allison, D. B. A design and statistical perspective on microarray gene expression studies in nutrition: the need for playful creativity and scientific hard-mindedness. Nutrition 19, 997–1000 (2003).

    Article  CAS  PubMed  Google Scholar 

  3. Yang, M. C., Yang, J. J., McIndoe, R. A. & She, J. X. Microarray experimental design: power and sample size considerations. Physiol. Genomics 16, 24–28 (2003).

    Article  CAS  PubMed  Google Scholar 

  4. Kerr, M. K. & Churchill, G. A. Experimental design for gene expression microarrays. Biostatistics 2, 183–201 (2001).

    Article  CAS  PubMed  Google Scholar 

  5. Dobbin, K., Shih, J. H. & Simon, R. Statistical design of reverse dye microarrays. Bioinformatics 19, 803–810 (2003).

    Article  CAS  PubMed  Google Scholar 

  6. Churchill, G. A. Fundamentals of experimental design for cDNA microarrays. Nature Genet. 32, S490–S495 (2002).

    Article  CAS  Google Scholar 

  7. Yang, Y. H. & Speed, T. Design issues for cDNA microarray experiments. Nature Rev. Genet. 3, 579–588 (2002).

    Article  CAS  PubMed  Google Scholar 

  8. Allison, D. B., Allison, R. L., Faith, M. S., Paultre, F. & Pi-Sunyer, F. X. Power and money: designing statistically powerful studies while minimizing financial costs. Psychol. Methods 2, 20–33 (1997).

    Article  Google Scholar 

  9. Allison, D. B. et al. A mixture model approach for the analysis of microarray gene expression data. Comput. Stat. Data Analysis 39, 1–20 (2002). This was the first paper in the field of microarray research to introduce mixture modelling.

    Article  Google Scholar 

  10. Pavlidis, P., Li, Q. & Noble, W. S. The effect of replication on gene expression microarray experiments. Bioinformatics 19, 1620–1627 (2003).

    Article  CAS  PubMed  Google Scholar 

  11. Tsai, C. A., Hsueh, H. M. & Chen, J. J. Estimation of false discovery rates in multiple testing: application to gene microarray data. Biometrics 59, 1071–1081 (2003).

    Article  PubMed  Google Scholar 

  12. Pan, W., Lin, J. & Le, C. T. How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol. 3, research0022 (2002).

  13. Zien, A., Fluck, J., Zimmer, R. & Lengauer, T. Microarrays: how many do you need? J. Comput. Biol. 10, 653–667 (2003).

    Article  CAS  PubMed  Google Scholar 

  14. Gadbury, G. L. et al. Power analysis and sample size estimation in the age of high dimensional biology: a parametric bootstrap approach and examples from microarray research. Stat. Methods Med. Res. 13, 325–338 (2004). This paper offers convenient FDR-based methods for power analysis and sample-size estimation in microarray and other high-dimensional testing situations.

    Article  Google Scholar 

  15. Pawitan, Y., Michiels, S., Koscielny, S., Gusnanto, A. & Ploner, A. False discovery rate, sensitivity and sample size for microarray studies. Bioinformatics 21, 3017–3024 (2005).

    Article  CAS  PubMed  Google Scholar 

  16. Muller, P., Parmigiani, G., Robert, C. & Rousseau, J. Optimal sample size for multiple testing: The case of gene expression microarrays. J. Am. Stat. Assoc. 99, 990–1001 (2004).

    Article  Google Scholar 

  17. Dobbin, K. & Simon, R. Sample size determination in microarray experiments for class comparison and prognostic classification. Biostatistics. 6, 27–38 (2005).

    Article  PubMed  Google Scholar 

  18. Garge, N., Page, G. P., Spargue, A. P., Gorman, B. S. & Allison, D. B. Reproducible clusters from microarray research: whither? BMC Bioinformatics 6 (Suppl. 2), S10 (2005). The authors evaluate clustering techniques using real data, and find that with sample sizes of less than 50, the reproducibility of results is poor.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Kendziorski, C. M., Zhang, Y., Lan, H. & Attie, A. D. The efficiency of pooling mRNA in microarray experiments. Biostatistics 4, 465–477 (2003). This paper clarifies concepts and statistical design issues that are involved with mRNA pooling in microarray experiments.

    Article  CAS  PubMed  Google Scholar 

  20. Kendziorski, C., Irizarry, R. A., Chen, K. S., Haag, J. D. & Gould, M. N. On the utility of pooling biological samples in microarray experiments. Proc. Natl Acad. Sci. USA 102, 4252–4257 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Chen, Y., Dougherty, E. R. & Bittner, M. L. Ratio-based decisions and the quantitative analysis of cDNA microarray images. J. Biomed. Opt. 2, 364–374 (1997).

    Article  CAS  PubMed  Google Scholar 

  22. Schadt, E. E., Li, C., Ellis, B. & Wong, W. H. Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data. J. Cell Biochem. Suppl. 37, 120–125 (2001).

  23. Ekstrom, C. T., Bak, S., Kristensen, C. & Rudemo, M. Spot shape modelling and data transformations for microarrays. Bioinformatics 20, 2270–2278 (2004).

    Article  CAS  PubMed  Google Scholar 

  24. Steinfath, M. et al. Automated image analysis for array hybridization experiments. Bioinformatics 17, 634–641 (2001).

    Article  CAS  PubMed  Google Scholar 

  25. Yang, Y. H., Buckley, M. J. & Speed, T. P. Analysis of cDNA microarray images. Brief Bioinform. 2, 341–349 (2001).

    Article  CAS  PubMed  Google Scholar 

  26. Quackenbush, J. Microarray data normalization and transformation. Nature Genet. 32, 496–501 (2002).

    Article  CAS  PubMed  Google Scholar 

  27. Yang, Y. H. et al. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30, e15 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Smyth, G. K. & Speed, T. Normalization of cDNA microarray data. Methods 31, 265–273 (2003).

    Article  CAS  PubMed  Google Scholar 

  29. Qin, L. X. & Kerr, K. F. Empirical evaluation of data transformations and ranking statistics for microarray analysis. Nucleic Acids Res. 32, 5471–5479 (2004). This article presents the effect of different image-processing and normalization techniques on microarray analysis conclusions.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Affymetrix. Affymetrix Expression Analysis Technical Manual (Affymetrix, Santa Clara, California, 2004).

  31. Nielsen, H. B., Gautier, L. & Knudsen, S. Implementation of a gene expression index calculation method based on the PDNN model. Bioinformatics 21, 687–688 (2005).

    Article  CAS  PubMed  Google Scholar 

  32. Irizarry, R. A. et al. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31, e15 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Mehta, T., Tanik, M. & Allison, D. B. Towards sound epistemological foundations of statistical methods for high-dimensional biology. Nature Genet. 36, 943–947 (2004). This paper clarifies the importance of methods for evaluating the validity of proposed statistical methodologies in high-dimensional biology, with an emphasis on microarray research.

    Article  CAS  PubMed  Google Scholar 

  34. Bolstad, B. M., Irizarry, R. A., Astrand, M. & Speed, T. P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).

    Article  CAS  PubMed  Google Scholar 

  35. Choe, S. E., Boutros, M., Michelson, A. M., Church, G. M. & Halfon, M. S. Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol. 6, R16 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Cope, L. M., Irizarry, R. A., Jaffee, H. A., Wu, Z. & Speed, T. P. A benchmark for Affymetrix GeneChip expression measures. Bioinformatics 20, 323–331 (2004).

    Article  CAS  PubMed  Google Scholar 

  37. Chen, D. T. A graphical approach for quality control of oligonucleotide array data. J. Biopharm. Stat. 14, 591–606 (2004).

    Article  PubMed  Google Scholar 

  38. Hsiao, A., Worrall, D. S., Olefsky, J. M. & Subramaniam, S. Variance-modeled posterior inference of microarray data: detecting gene-expression changes in 3T3-L1 adipocytes. Bioinformatics 20, 3108–3127 (2004).

    Article  CAS  PubMed  Google Scholar 

  39. Miller, R. A., Galecki, A. & Shmookler-Reis, R. J. Interpretation, design, and analysis of gene array expression experiments. J. Gerontol. A 56, B52–B57 (2001).

    Article  CAS  Google Scholar 

  40. Budhraja, V., Spitznagel, E., Schaiff, W. T. & Sadovsky, Y. Incorporation of gene-specific variability improves expression analysis using high-density DNA microarrays. BMC Biol. 1, 1 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Cui, X., Hwang, J. T., Qiu, J., Blades, N. J. & Churchill, G. A. Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6, 59–75 (2005). This article provides one method of shrinkage and compares its performance with other variance shrinkage methods.

    Article  PubMed  Google Scholar 

  42. Tusher, V. G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci USA 98, 5116–5121 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Baldi, P. & Long, A. D. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17, 509–519 (2001).

    Article  CAS  PubMed  Google Scholar 

  44. Edwards, J. W. et al. Empirical Bayes estimation of gene-specific effects in micro-array research. Funct. Integr. Genomics 5, 32–39 (2005).

    Article  CAS  PubMed  Google Scholar 

  45. Ge, Y. C., Dudoit, S. & Speed, T. P. Resampling-based multiple testing for microarray data analysis. Test 12, 1–77 (2003).

    Article  Google Scholar 

  46. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate — a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).

    Google Scholar 

  47. Hsueh, H. M., Chen, J. J. & Kodell, R. L. Comparison of methods for estimating the number of true null hypotheses in multiplicity testing. J. Biopharm. Stat. 13, 675–689 (2003).

    Article  PubMed  Google Scholar 

  48. van der Lann, M. J., Dudoit, S. & Pollard, K. S. Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives. Stat. Appl. Genet. Mol. Biol. 3, A15 (2004).

    Google Scholar 

  49. Storey, J. D. The positive false discovery rate: A Bayesian interpretation and the q-value. Ann. Stat. 31, 2013–2035 (2003). This paper clarifies the key terminology and concepts used in FDR-related methods.

    Article  Google Scholar 

  50. Do, K. A., Mueller, P. & Tang, F. A nonparametric Bayesian mixture model for gene expression. J. R. Stat. Soc. Ser. C 54, 1–18 (2005).

    Article  Google Scholar 

  51. Pounds, S. & Morris, S. W. Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics 19, 1236–1242 (2003).

    Article  CAS  PubMed  Google Scholar 

  52. Datta, S. & Datta, S. Empirical Bayes screening of many p-values with applications to microarray studies. Bioinformatics 21, 1987–1994 (2005).

    Article  CAS  PubMed  Google Scholar 

  53. Efron, B., Tibshirani, R., Storey, J. D. & Tusher, V. G. Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. 96, 1151–1160 (2001).

    Article  Google Scholar 

  54. Newton, M. A., Noueiry, A., Sarkar, D. & Ahlquist, P. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5, 155–176 (2004).

    Article  PubMed  Google Scholar 

  55. Newton, M. A., Kendziorski, C. M., Richmond, C. S., Blattner, F. R. & Tsui, K. W. On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8, 37–52 (2001).

    Article  CAS  PubMed  Google Scholar 

  56. Mootha, V. K. et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature Genet. 34, 267–273 (2003).

    Article  CAS  PubMed  Google Scholar 

  57. Osier, M. V. in DNA Microarrays and Statistical Genomic Techniques: Design, Analysis, and Interpretation of Experiments (Marcel Dekker, New York, 2005).

    Google Scholar 

  58. Osier, M. V., Zhao, H. & Cheung, K. H. Handling multiple testing while interpreting microarrays with the Gene Ontology Database. BMC Bioinformatics 5, 124 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Khatri, P., Draghici, S., Ostermeier, G. C. & Krawetz, S. A. Profiling gene expression using onto-express. Genomics 79, 266–270 (2002).

    Article  CAS  PubMed  Google Scholar 

  60. Pavlidis, P., Weston, J., Cai, J. & Noble, W. S. Learning gene functional classifications from multiple data types. J. Comput. Biol. 9, 401–411 (2002).

    Article  CAS  PubMed  Google Scholar 

  61. Pavlidis, P., Qin, J., Arango, V., Mann, J. J. & Sibille, E. Using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex. Neurochem. Res. 29, 1213–1222 (2004). This study introduces a gene-class testing method that uses the full continuous evidence that is available within p -values.

    Article  CAS  PubMed  Google Scholar 

  62. Ben Shaul, Y., Bergman, H. & Soreq, H. Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression. Bioinformatics 21, 1129–1137 (2005).

    Article  CAS  PubMed  Google Scholar 

  63. Zeeberg, B. R. et al. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol. 4, R28 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Damian, D. & Gorfine, M. Statistical concerns about the GSEA procedure. Nature Genet. 36, 663 (2004).

    Article  CAS  PubMed  Google Scholar 

  65. Persson, S., Wei, H., Milne, J., Page, G. P. & Somerville, C. R. Identification of genes required for cellulose synthesis by regression analysis of public microarray data sets. Proc. Natl Acad. Sci. USA 102, 8633–8638 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Kyng, K. J., May, A., Kolvraa, S. & Bohr, V. A. Gene expression profiling in Werner syndrome closely resembles that of normal aging. Proc. Natl Acad. Sci. USA 100, 12259–12264 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Schmid, C. H., Lau, J., McIntosh, M. W. & Cappelleri, J. C. An empirical study of the effect of the control rate as a predictor of treatment efficacy in meta-analysis of clinical trials. Stat. Med. 17, 1923–1942 (1998).

    Article  CAS  PubMed  Google Scholar 

  68. Berger, R. L. Multiparameter hypothesis testing and acceptance sampling. Technometrics 24, 295–300 (1982).

    Article  Google Scholar 

  69. Neuhauser, M., Boes, T. & Jockel, K. H. Two-part permutation tests for DNA methylation and microarray data. BMC Bioinformatics 6, 35 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Barry, W. T., Nobel, A. B. & Wright, F. A. Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 21, 1943–1949 (2005).

    Article  CAS  PubMed  Google Scholar 

  71. Pan, W. On the use of permutation in and the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics 19, 1333–1340 (2003).

    Article  CAS  PubMed  Google Scholar 

  72. Xu, R. H. & Li, X. C. A comparison of parametric versus permutation methods with applications to general and temporal microarray gene expression data. Bioinformatics 19, 1284–1289 (2003).

    Article  CAS  PubMed  Google Scholar 

  73. Landgrebe, J., Wurst, W. & Welzl, G. Permutation-validated principal components analysis of microarray data. Genome Biol. 3, RESEARCH0019 (2002).

  74. Troendle, J. F., Korn, E. L. & McShame, L. M. An example of slow convergence of the bootstrap in high dimensions. Am. Stat. 58, 25–29 (2004). This presents an excellent overview of the nuances of resampling methodology that is used in microarray research, and discusses the fact that such methods are not assumption-free panaceas that are valid under all circumstances.

    Article  Google Scholar 

  75. Kennedy, P. E. & Cade, B. S. Randomization tests for multiple regression. Commun. Stat. 25, 923–936 (1996).

    Article  Google Scholar 

  76. Gadbury, G. L., Page, G. P., Heo, M., Mountz, J. D. & Allison, D. B. Randomization tests for small samples: an application for genetic expression data. J. R. Stat. Soc. Ser. C 52, 365–376 (2003).

    Article  Google Scholar 

  77. Yeung, K. Y., Haynor, D. R. & Ruzzo, W. L. Validating clustering for gene expression data. Bioinformatics 17, 309–318 (2001).

    Article  CAS  PubMed  Google Scholar 

  78. Datta, S. & Datta, S. Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19, 459–466 (2003).

    Article  CAS  PubMed  Google Scholar 

  79. Shih, J. H. et al. Effects of pooling mRNA in microarray class comparisons. Bioinformatics 20, 3318–3325 (2004).

    Article  CAS  PubMed  Google Scholar 

  80. Yeung, K. Y., Medvedovic, M. & Bumgarner, R. E. From co-expression to co-regulation: how many microarray experiments do we need? Genome Biol. 5, R48 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  81. Bryan, J. Problems in gene clustering based on gene expression data. J. Multivariate Analysis 90, 44–66 (2004). This is an excellent overview of the methodological and conceptual challenges in the use of cluster analysis in gene-expression studies.

    Article  Google Scholar 

  82. Kerr, M. K. & Churchill, G. A. Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proc. Natl Acad. Sci. USA 98, 8961–8965 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Zhang, K. & Zhao, H. Assessing reliability of gene clusters from gene expression data. Funct. Integr. Genomics 1, 156–173 (2000).

    Article  CAS  PubMed  Google Scholar 

  84. Tseng, G. C. & Wong, W. H. Tight clustering: a resampling-based approach for identifying stable and tight patterns in data. Biometrics 61, 10–16 (2005).

    Article  PubMed  Google Scholar 

  85. Horth, J. Computer Intensive Statistical Methods Validation, Model Selection and Boostrap (Chapman and Hall, London, 1994).

    Google Scholar 

  86. Ambroise, C. & McLachlan, G. J. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl Acad. Sci. USA 99, 6562–6566 (2002). This article addresses selection bias in the context of predictive error-estimation and cross-validation for microarray studies.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Furlanello, C., Serafini, M., Merler, S. & Jurman, G. Entropy-based gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinformatics 4, 54 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  88. Fu, W. J., Carroll, R. J. & Wang, S. Estimating misclassification error with small samples via bootstrap cross-validation. Bioinformatics 21, 1979–1986 (2005).

    Article  CAS  PubMed  Google Scholar 

  89. Dobbin, K. & Simon, R. Sample size determination in microarray experiments for class comparison and prognostic classification. Biostatistics 6, 27–38 (2005).

    Article  PubMed  Google Scholar 

  90. Hwang, D., Schmitt, W. A., Stephanopoulos, G. & Stephanopoulos, G. Determination of minimum sample size and discriminatory expression patterns in microarray data. Bioinformatics 18, 1184–1193 (2002).

    Article  CAS  PubMed  Google Scholar 

  91. Mukherjee, S. et al. Estimating dataset size requirements for classifying DNA microarray data. J. Comput. Biol. 10, 119–142 (2003).

    Article  CAS  PubMed  Google Scholar 

  92. Rajeevan, M. S., Ranamukhaarachchi, D. G., Vernon, S. D. & Unger, E. R. Use of real-time quantitative PCR to validate the results of cDNA array and differential display PCR technologies. Methods 25, 443–451 (2001).

    Article  CAS  PubMed  Google Scholar 

  93. Rockett, J. C. & Hellmann, G. M. Confirming microarray data — is it really necessary? Genomics 83, 541–549 (2004).

    Article  CAS  PubMed  Google Scholar 

  94. Rocke, D. M. & Durbin, B. Approximate variance-stabilizing transformations for gene-expression microarray data. Bioinformatics 19, 966–972 (2003).

    Article  CAS  PubMed  Google Scholar 

  95. Pounds, S. & Cheng, C. Statistical development and evaluation of microarray gene expression data filters. J. Comput. Biol. 12, 482–495 (2005).

    Article  CAS  PubMed  Google Scholar