The use of unlabelled data for supervised learning
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
When provided with enough labelled training examples, a supervised learning algorithm can learn reasonably accurately. However, creating sufficient labelled data to train accurate classifiers is time consuming and expensive. On the other hand, unlabelled data is usually easy to obtain. This research introduces a novel approach, Guelph Cluster Class (GCC), which improves the task of classification with the use of unlabelled data. The novelty of this approach lies in the use of an unsupervised network, 'Self-Organizing Map', to select natural clusters in labelled and unlabelled data. Sub-classes (made by labelled data) are used to assign labels to unlabelled patterns to produce ' self-labelled' data. The performance of several variants of the GCC system have been obtained by running a 'Back-Propagation' network on labelled and self-labelled data. Results of experiments on several benchmark datasets demonstrate an increasing power for the classification procedure even when the number of labelled data is very small.