Main content

Assessing Errors in DNA Barcode Sequence Records

Show full item record

Title: Assessing Errors in DNA Barcode Sequence Records
Author: Athey, Taryn
Department: Department of Mathematics and Statistics
Program: Bioinformatics
Advisor: McNicholas, PaulHanner, Robert
Abstract: DNA barcoding uses a standardized gene region to identify species. In animals, the barcode is a 648bp region of the cytochrome c oxidase I gene. These sequences are uploaded to the Barcode of Life Data System (BOLD), a reference library, which requires accuracy of its data. This thesis uses a cross-taxa study to assess the use of a frequency matrix approach to identify very low frequency variants (VLFs), which represent potential errors within BOLD. In each group analyzed, most VLFs occurred in the first and last 50bp of the sequences, consistent with known error properties of Sanger sequencing. To correct for this, success rates of different classification methods on full length and reduced barcodes were assessed. Neither reduction of the barcode, nor number of VLFs affected the success rate of classification. This indicates that trimming barcodes by 50bp could reduce the overall error in BOLD without affecting species identification.
URI: http://hdl.handle.net/10214/7588
Date: 2013-10


Files in this item

Files Size Format View
Athey_Taryn_201310_MSc.pdf 4.416Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record