A new form of nested association pattern for data mining and class discrimination

dc.contributor.advisorChiu, David K.Y.
dc.contributor.authorLui, Thomas Wing Hong
dc.degree.departmentDepartment of Computing and Information Scienceen_US
dc.degree.grantorUniversity of Guelphen_US
dc.degree.nameDoctor of Philosophyen_US
dc.description.abstractTo facilitate an easier interpretation, supported by identifying the complex internal associations between multiple values of a pattern, a new form of high-order (multi-value) pattern known as 'N'ested ' H'igh-'O'rder 'P'attern (NHOP) is presented. The proposed form of pattern has a nested granular structure that highlights a hierarchical and iterative association evaluation. Multi-value association pattern as defined, also generalizes a common form of data mining of sequential pattern, since it represents a set of associated values extracted from sampling outcomes of a random 'N'-tuple and need not be contiguous. Furthermore, because it is a value pattern from multiple variables, it is also more descriptive than its corresponding variable pattern. The pattern is detected by statistical testing if its occurrence is significantly deviated from the expected according to a prior model or null hypothesis. Even though NHOP by itself is clearly important in understanding the association structure from the dataset, we further extend NHOP to perform classification as discrimination pattern. The rationale is that, meaningful association pattern and at the same time predicting class discrimination can reinforce the detection of the underlying regularity, and hence can further understand the data domain. We propose a 'C'lassification method based on the 'N'ested 'H'igh-'O'rder ' P'atterns (referred to as C-NHOP). The relevance of NHOP for pattern discovery is evaluated using synthetic data, machine learning benchmark datasets, and real life biomolecular and organismal biological datasets. The first evaluation is based on comparing two types of closely related patterns, the general type of high-order pattern and the proposed NHOP. The second evaluation is based on the data of a biomolecule family known as SH3 domain, a model for protein-protein interaction mediator. We proposed two algorithms, the 'r-Tree' and the 'Best-k ' algorithm, to extract a set of patterns using the maximized criterion of NHOP. The goal is to identify the relationship between the primary structure and the 3-dimensional structure of the molecule. The relevance of C-NHOP is also evaluated using 26 machine learning benchmark datasets. Experiments show that C-NHOP is very competitive in classification tasks. Finally, the proposed classifier using NHOP is applied to an important problem in biotechnology. It is used to differentiate the transgenic and conventional pig lins concerning their chemical compositions in tissues. Since this set of data can be used in evaluating the effects relating genetic manipulations and physiochemical consequences, a reliable evaluation is extremely important to the newly developed technology in biology. As a whole, we conclude that we have evaluated the significance of the proposed pattern and found it to be extremely useful in data mining and class discrimination tasks.en_US
dc.publisherUniversity of Guelphen_US
dc.rights.licenseAll items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.
dc.subjectnested association patternen_US
dc.subjectdata miningen_US
dc.subjectclass discriminationen_US
dc.subjectNested High-Order Patternen_US
dc.titleA new form of nested association pattern for data mining and class discriminationen_US


Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
2.68 MB
Adobe Portable Document Format