Assessment of Intrinsically Disordered Protein Phylogenetics based on Tree Precision in the Lee-Ashlock space
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Intrinsically disordered proteins (IDPs) lack a stable structure, but are biologically functional, disturbing the structure-function paradigm. Most IDPs also have a faster evolution rate and more frequent insertions and deletions than ordered proteins. My research interest is disordered plant proteins known as dehydrins (dehydration-induced proteins), which are strongly associated with preventing damage caused by dehydration. The sequences of dehydrins are highly variable, and are classified into one of five architectures based on the presence of three conserved motifs (Y-, S-, and K-segments): Kn, SKn, KnS, YnKn, and YnSKn. In order to gain a deeper understanding of the dehydrin evolution a phylogenetic analysis was performed on 426 dehydrin sequences. From the phylogenetic tree, I identified recent dehydrin architecture divergences. However, as dehydrins became more divergent, identifying architecture splits became more challenging as the phylogenetic support weakened, possibly because their multiple sequence alignments (MSA) are more prone to errors. It has been suggested that IDPs are difficult to align because they are less conserved than ordered proteins, but to my knowledge this has not been directly examined. Using sets of orthologous IDP sequences and orthologous ordered protein sequences from mammals, I inferred phylogenetic trees using MSAs and alignment-free methods, and assessed trees based on how close they were to the mammal species tree using a normalized Lee-Ashlock distance. Generally, there was no significant difference between the tree precision of the alignment-based trees of the IDPs and ordered proteins, suggesting IDP MSAs produces trees as precise as ordered protein MSAs. Additionally, the alignment-based methods generally produced more precise trees than the alignment-free methods, with some individual sequence sets resulting in more precise trees using alignment-free methods. Although the highly variable sequences of dehydrins makes them challenging to compare, my findings suggest that the alignment of IDPs is just as suitable as the alignment of ordered proteins for phylogenetic analysis. Optimization of alignment-free methods may create more precise trees for certain sequence sets, and should be studied further.