Main content

NEXT GENERATION SEQUENCING DATA IN BOVINE: QUALITY CONTROL, IMPUTATION, AND APPLICATION

Show full item record

Title: NEXT GENERATION SEQUENCING DATA IN BOVINE: QUALITY CONTROL, IMPUTATION, AND APPLICATION
Author: Larmer, Steven G
Department: Department of Animal Biosciences
Program: Animal and Poultry Science
Advisor: Schenkel, Flavio S
Abstract: Next generation sequencing technology has revolutionized the study of the human genome. The applications for full sequence in cattle are far reaching and broad. There is, however, a limitation to the application of sequence data due to high cost and a large diversity across cattle breeds. The purpose of this study was to investigate parameters related to the quality of bovine sequences and to optimize how variants are called and used in sequence studies. Imputation to whole genome sequence was investigated as a means to expand the sequence data in an effective manner. Across breed imputation using various clustering algorithms was examined based on genotype and haplotype diversity in the sequenced populations. Crampiness in Holsteins was explored as a poorly understood trait of interest that could not be well described using genotype panels alone. A pipeline was created for sequence quality using imputation accuracy as a metric for reference population quality. Phred scaled genotype quality score was the single most important factor in determining which calls would be imputed accurately, and quality of specific SNPs per animal was the most effective method of quality control. Clustering individuals based on reconstructed haplotypes was more effective than using clustered genotypes for imputation from 50k and 777k genotypes to sequence. A by-product of this analysis was a model to select individuals to be removed from imputation studies due to a low predicted imputation accuracy. GWAS was carried out using sequence data on crampiness in Holsteins; however, no major genes were found. This lead to the conclusion that sequence data may not always be useful if the trait of interest is multi-genic and lowly heritable, and a greater population of true and imputed sequences may be required for more powerful association analysis. Sequence data will be critical to the future of understanding the bovine genome and the effects it has on important traits in the dairy and beef industries. This research emphasizes the need for further research to find causative mutations and emphasizes the global drive to sequence more individuals for a sufficiently large population to make meaningful associations.
URI: http://hdl.handle.net/10214/9625
Date: 2016-05
Terms of Use: All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.


Files in this item

Files Size Format View
Larmer_Steven_201605_PhD.pdf 1.703Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record