An order estimation based approach to identify response genes for microarray time course data
Microarray time course experiments have been widely used to investigate temporal patterns of gene expression profiles. These expression profiles provide a unique opportunity to examine genome-wide signal processing and gene responses. A fundamental issue in microarray experimental design is that the treatment condition can only be controlled to the cell level rather than to the gene level. Given that some genes depend on other genes to detect changes in external conditions and that this kind of dependency is not fully deterministic and may vary across genes and treatment conditions, the expression of each gene is potentially affected by two confounding effects: the treatment effect and the gene context effect arising from the regulatory interaction structure among genes. This gene context effect is hard to isolate. Neither can it be simply ignored. Instead, this gene context information which is different under different treatment conditions is of primary biological interest and thus demands attention of statistical analysis. We introduce an approach which provides a way to deal with the confounding effects and takes into account the uncontrollable gene context effect. Our method is developed to estimate the number of hidden states which is also referred to as the order of a hidden Markov model (HMM) for each gene. The observed gene expressions are modeled by gamma distributions determined by the corresponding hidden state at each time point. Those genes showing evidence for more than one hidden state can be categorized as the signaling genes, or in a wider sense, as the response genes which are coordinated by a cell system in reaction to a specific external condition. These response genes can be used in the comparison of different treatment conditions, to investigate the gene context effect under different treatments. Our method also provides flexibility in adjusting type I error rates to find response genes at different response intensity levels. Both simulated data and real microarray time course data are analyzed to demonstrate our method.