Main content

Large-scale clustering of antigen receptor gene sequence data using hyper-dimensional point packing

Show full item record

Title: Large-scale clustering of antigen receptor gene sequence data using hyper-dimensional point packing
Author: Chang, Haiyang
Department: Department of Pathobiology
Program: Bioinformatics
Advisor: Keller, Stefan
Abstract: Lymphocytes generate abundant antigen receptor (AR) genes to recognize an almost infinite number of epitopes. One challenge is to group AR sequences based on the recognition of a common epitope. Traditional clustering methods are based on hierarchical clustering, which comes at a significant computational cost due to pairwise genetic distance comparisons. In this thesis, a point packing strategy was applied to incrementally break down the data into subsets, which limits pairwise sequence comparison to the final cluster level. Sub-setting was achieved by picking maximally spaced anchor sequences from a dataset, iteratively, and assigning the remaining sequences to the closest anchor. This results in an inverted tree with anchor sequences as nodes and a descending anchor distance gradient for each layer. In addition, new sequences can be added to a clustered dataset by comparison with existing anchor nodes to achieve quick positioning and substantially reduce the computational burden.
URI: http://hdl.handle.net/10214/14079
Date: 2018-07-20
Rights: Attribution-NonCommercial-NoDerivs 2.5 Canada


Files in this item

Files Size Format View Description
Chang_Haiyang_201808_Msc.pdf 3.661Mb PDF View/Open Main article

This item appears in the following Collection(s)

Show full item record

Attribution-NonCommercial-NoDerivs 2.5 Canada Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 2.5 Canada