Main content

A New, Model-Independent, Spectrum-Based Gene Prediction Technique

Show simple item record

dc.contributor.advisor Kremer, Stefan C. Marhon, Sajid A. 2015-04-24T18:20:05Z 2016-04-10T05:00:12Z 2015-04 2015-04-10 2015-04-24
dc.description.abstract Detecting protein-coding regions is a fundamental step in genome analysis. This step is a precursor to analyzing protein sequences. Different techniques have been proposed for detecting protein-coding regions in DNA sequences. Popular techniques use probabilistic models such as Hidden Markov Models (HMMs) that depend on homology information (a learning model) in the analysis of DNA sequences, and this makes the application of these techniques restricted to species that have homologs. This approach is classified as a model-dependent method. Digital Signal Processing (DSP)-based techniques, which rely on the spectral analysis of DNA sequences, extract the period-3 spectrum of DNA sequences to detect protein-coding regions. DSP-based techniques are classified as model-independent techniques. In this research, I propose a new, DSP-based technique that overcomes the limitation in the prediction accuracy of the current DSP-based techniques and the problem of application specificity of learning-based techniques. In my technique, I propose four improvements in different stages of the gene prediction process: (1) a dynamic representation scheme, (2) an efficient method for computing nucleotide distribution variance, (3) post-processing to attenuate background noise and detect spectrum peaks without requiring an experimental threshold, and (4) a new wide-range wavelet window. My experimental results show that my technique outperforms all other popular DSP-based techniques. In addition, the comparison of the results of the proposed technique with the popular HMMgene technique shows that my technique performs better on the novel gene detection problem. I believe that this is an area of research that has been underemphasized and deserves additional attention. en_US
dc.language.iso en en_US
dc.subject Gene Prediction en_US
dc.subject Period-3 Spectrum en_US
dc.subject Protein Coding Regions en_US
dc.subject Digital Signal Processing en_US
dc.subject DNA Sequence Analysis en_US
dc.title A New, Model-Independent, Spectrum-Based Gene Prediction Technique en_US
dc.type Thesis en_US Computer Science en_US Doctor of Philosophy en_US School of Computer Science en_US
dc.rights.license All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.

Files in this item

Files Size Format View Description
Marhon_Sajid_201504_PhD.pdf 1.809Mb PDF View/Open Main Thesis Document

This item appears in the following Collection(s)

Show simple item record