Main content

Assessing Errors in DNA Barcode Sequence Records

Show simple item record

dc.contributor.advisor McNicholas, Paul
dc.contributor.advisor Hanner, Robert Athey, Taryn 2013-10-17T13:04:28Z 2013-10-17T13:04:28Z 2013-10 2013-09-23 2013-10-17
dc.description.abstract DNA barcoding uses a standardized gene region to identify species. In animals, the barcode is a 648bp region of the cytochrome c oxidase I gene. These sequences are uploaded to the Barcode of Life Data System (BOLD), a reference library, which requires accuracy of its data. This thesis uses a cross-taxa study to assess the use of a frequency matrix approach to identify very low frequency variants (VLFs), which represent potential errors within BOLD. In each group analyzed, most VLFs occurred in the first and last 50bp of the sequences, consistent with known error properties of Sanger sequencing. To correct for this, success rates of different classification methods on full length and reduced barcodes were assessed. Neither reduction of the barcode, nor number of VLFs affected the success rate of classification. This indicates that trimming barcodes by 50bp could reduce the overall error in BOLD without affecting species identification. en_US
dc.language.iso en en_US
dc.subject DNA Barcoding en_US
dc.subject Frequency Matrix en_US
dc.subject Very Low Frequency Variant en_US
dc.subject VLF en_US
dc.subject Bioinformatics en_US
dc.subject Classification en_US
dc.subject Barcoding with Logic en_US
dc.title Assessing Errors in DNA Barcode Sequence Records en_US
dc.type Thesis en_US Bioinformatics en_US Master of Science en_US Department of Mathematics and Statistics en_US
dc.rights.license All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.

Files in this item

Files Size Format View
Athey_Taryn_201310_MSc.pdf 4.416Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record