Using Association Rules and Logical Learning for Clustering

Liu, Jia
Journal Title
Journal ISSN
Volume Title
University of Guelph

This thesis tests whether the apriori algorithm, together with Current Best Hypothesis logical learning, can cluster data such as environmental data. The procedure is tested on a large, sparse dataset describing the ecological impact of agriculture on macroinvertebrate populations in the theTaiZi River Basin of the People’s Republic of China. We compare the association rules / CBH results with two other established algorithms. We demonstrate that our results are a considerable improvement over the other methods tested. We confirm the hypothesis that a threshold of roughly 21% or more of land area devoted to agriculture in this watershed is the “tipping point” to surface water degradation. So-called stochastic dominance is used to further understand the utility of agricultural land use and mitigation procedures such as riparian barriers to intercept and treat runoff. The new algorithm has been successful, but more study of the various apriori parameters is necessary.

Data mining, Cluster analysis, Apriori, Current Best Hypothesis, Graham Scan, PC-ord, K-means