Main content

A Novel Statistical Framework for Assessment of Intraspecific Haplotype Sampling Sufficiency: Implications for DNA Barcode Gap Estimation

Show full item record

Title: A Novel Statistical Framework for Assessment of Intraspecific Haplotype Sampling Sufficiency: Implications for DNA Barcode Gap Estimation
Author: Phillips, Jarrett
Department: School of Computer Science
Program: Computational Sciences
Advisor: Gillis, DanielHanner, Robert
Abstract: The problem of determining adequate sample sizes necessary for studies of biodiversity conservation and management is a challenging one that has received some attention in recent years. One particular area where the probing of sampling completeness is of utmost priority is DNA barcoding. Species show remarkable genomic marker variation within and among taxa, along with differing evolutionary and life histories. Thus, knowing how many specimens of a given species likely need to be collected to observe the majority of standing COI haplotype diversity present within animal species is a complex question to answer. Estimates of specimen sample sizes for DNA barcoding range from a single individual to hundreds of individuals per species (but typically around 5-10 individuals). However, due to obstacles surrounding project funding and species rarity, often just one or two specimens per species can be reasonably collected. In addition, numerous other factors, especially sequence quality and integrity, hinder the accurate and reliable estimation of specimen sample sizes from existing species-level sequence data found in large DNA repositories. Here, a deep examination of the genetic specimen sample size problem (GSSSP) is undertaken. Specifically, a novel nonparametric stochastic local search optimization algorithm based on trends in species haplotype accumulation curves, herein called HACSim (Haplotype Accumulation Curve Simulator) is introduced. The method, available as an R package, is tested on a variety of both hypothetical and real animal species mined from the Barcode of Life Data Systems (BOLD). Through a detailed statistical simulation study, the approach is demonstrated to work well across all examined scenarios. As HACSim makes numerous simplifying assumptions that are unlikely to hold well in practice, such as panmixia (random mating), future work in incorporating elements of population structure is imperative. In addition, it is argued that DNA barcoding currently lacks in statistical rigor needed to robustly estimate the DNA barcode gap, an important quantity expressing the difference between intraspecific and interspecific genetic variation. A number of accessible statistical solutions revolving around sample sizes needed for gap assessment, as well as visualization and inference are offered in this regard.
Date: 2022-05-05
Rights: Attribution 4.0 International
Related Publications: Phillips, J.D., Gillis, D.J. and Hanner, R.H. (2019). Incomplete estimates of genetic diversity within species: Implications for DNA barcoding. Ecology and Evolution, 9(5): 2996-3010. DOI: 10.1002/ece3.4757.Phillips, J.D., French, S.H., Hanner, R.H. and Gillis, D.J. (2020). HACSim: An R package to estimate intraspecific sample sizes for genetic diversity assessment using haplotype accumulation curves. PeerJ Computer Science, 6(192): 1-37. DOI: 10.7717/peerj-cs.243.Phillips, J.D., Gillis, D.J. and Hanner, R.H. (2022). Lack of statistical rigor in DNA barcoding likely invalidates the presence of a true species’ barcode gap. Frontiers in Ecology and Evolution, 10: 859099. DOI: 10.3389/fevo.2022.859099.

Files in this item

Files Size Format View
Phillips_Jarrett_202205_PhD.pdf 20.88Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record

Attribution 4.0 International Except where otherwise noted, this item's license is described as Attribution 4.0 International
The library is committed to ensuring that members of our user community with disabilities have equal access to our services and resources and that their dignity and independence is always respected. If you encounter a barrier and/or need an alternate format, please fill out our Library Print and Multimedia Alternate-Format Request Form. Contact us if you’d like to provide feedback:  (email address)