Main content

Improved Nefclass For Datasets With Skewed Feature Values

Show full item record

Title: Improved Nefclass For Datasets With Skewed Feature Values
Author: Yousefi, Jamileh
Department: School of Computer Science
Program: Computer Science
Advisor: Hamilton-Wright, Andrew
Abstract: Most machine learning algorithms perform poorly on datasets with skewed feature values distribution. Skewed feature values are commonly observed in biological and medical datasets. This poses a challenge for the classification of medical data. Neuro-fuzzy systems are common machine learning approaches in the medical domain because of their ability to learn fuzzy rules from training data and represent the rules in an understandable way. Therefore, addressing skewness in neuro-fuzzy systems is a topic of interest because of their applicability in the medical domain. In this thesis, the NEFCLASS neuro-fuzzy classifier is extended to provide improved classification accuracy over the original NEFCLASS classifier when trained on skewed data. In order to improve accuracy, we used two methods. Firstly, we used two alternative discretization methods. Secondly, we devised several asymmetric linguistic hedges. The accuracy-transparency trade-off is also one of the most notable challenges when applying machine learning tools in the medical domain. Therefore, the second problem addressed is improving the transparency of NEFCLASS without significant accuracy deterioration. We have devised a statistical rule pruning algorithm which uses adjusted residuals to reduce the number of rules, thus improving transparency. Moreover, a hybrid approach combining the above approaches is proposed. The algorithms have been evaluated on synthetic F-Distributed and Circular-Uniform Distributed datasets. Additionally, they have been assessed using real-world electromyography and Wisconsin Diagnostic Breast Cancer datasets, which are known to have highly skewed feature values. We evaluated the accuracy of the classifiers using misclassification percentages, and the transparency of the rule-based classifiers using the number of rules. Both independently and in combination, our three approaches provide a considerable improvement in classification accuracy and transparency on skewed data. This research can contribute to an improvement in decision-making in healthcare or any other area where a significant fraction of the domain data has highly skewed distributions of feature values. In particular, our strategy has led to greater diagnostic accuracy to distinguish neuromuscular diseases using electromyography data. This methodology is not limited to NEFCLASS and neuro-fuzzy systems because our approaches are not directly tied to the structure of NEFCLASS. Hence, we expect that our techniques can be applied to any application in which fuzzy logic is used. Furthermore, our rule pruning approach has the potential to be used in other fuzzy and non-fuzzy classifiers.
URI: http://hdl.handle.net/10214/12610
Date: 2018-03
Rights: Attribution-NonCommercial-NoDerivs 2.5 Canada
Terms of Use: All items in the Atrium are protected by copyright with all rights reserved unless otherwise indicated.


Files in this item

Files Size Format View
Jamileh_Yousefi_201804_PhD.pdf 14.52Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record

Attribution-NonCommercial-NoDerivs 2.5 Canada Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 2.5 Canada