A segmentation algorithm for consensus regions in biosequences
The availability of large number of protein sequences and functionality information in many biological resources provides an opportunity for detailed analysis of biomolecules. Many biological and biochemical properties are reflected by molecular regions, rather than a single molecular point. Statistical patterns, defined in various ways, have been demonstrated to be very flexible in evaluating complex relationships between biological functions or molecular structure of a protein and its sequence. In this research, we develop a method to specify statistical complex patterns for segmentation. An algorithm and criteria for detecting consensus regions will be presented. The algorithm analyzes positional statistical complex patterns, and divides the sequence into sequential regions based on these patterns. In the experiments, tumor suppressor p53 is examined using aligned sequences from 31 species. The results show significant association between the consensus regions and (1) the 3D molecular structure, (2) point mutations characteristics in cancer patients. Similar relationship with 3D molecular structure is also found in additional experiments on lysozyme.