Granular Association Testing in p53 Multiple Sequence Alignment
In biomolecules, the relationship among the sequence, molecular structure, and biological function are of very importance in the development of nanotechnology such as drug discovery. Previous studies involving multiple sequence alignment of biomolecules have demonstrated that interdependent or associated sites are indicative of the structural and functional characteristics of biomolecules, as an extension to methods such as consensus sequences analysis. In this thesis, a new method to detect associated sites in aligned sequence ensembles is proposed. It involves the use of multiple sub-tables (or levels) of two-dimensional contingency table analysis. The idea is to incorporate analysis by following an approach known as granular computing, which represents information at different levels of granularity or resolution. When associations of multiple sites in the sequence alignment converge, they reflect points of interrelatedness among the sites in the biomolecules. The study involves two different phases of analysis. The first phase includes labeling of the molecular sites in the p53 protein multiple sequence alignment according to the detected patterns. The sites are consequently labeled into three different types based on their site characteristics - conserved sites, associated sites, and hypervariate sites. To identify and label the associated sites, the proposed method is employed. In the second phase, the significance of the extracted site patterns is evaluated with respect to some of the structural and functional characteristics of the p53 protein. The results indicate that the extracted site patterns in combination with conserved sites are significantly associated with some of the known functionalities of p53 such as post translational modifications and the mutation frequency of the sites, hence establishing the link between these identified sites and the defined functionality. Furthermore, when these sites are aligned with p63 and p73, the homologs of p53, based on the common domains, the sites significantly discriminate between the human sequences of the p53 family. Therefore, the study confirms the importance of these detected sites that could indicate their differences in cancer suppressing property.