Automating tools for scanning, characterizing and analyzing large DNA sequence data sets

Thumbnail Image
von Königslöw, Taika
Journal Title
Journal ISSN
Volume Title
University of Guelph

Progress in DNA sequencing techniques has facilitated high-throughput research by providing greater speed and accuracy at a lower cost. The sheer volume of data generated requires new methods for managing and analyzing large DNA sequence data sets. This thesis focuses on the development of two computer programs that help to meet this need for augmented analytical tools: ' SeqCleanR' (Chapter 2) and 'Diagnostica' (Chapter 3). The program 'SeqCleanR' makes use of profile hidden Markov models to help manage large datasets and provide another level of error detection and characterization prior to sequence analysis. The program ' Diagnostica' employs a newly developed method for locating single and compound DNA sequence characters that are diagnostic of the data sets provided.

DNA sequencing techniques, DNA sequence datasets, computer programs, augmented analytical tools, SeqCleanR, Diagnostica, error detection, characterization, sequence analysis, diagnostic