An effective positive-unlabeled learning method for detecting a large scale of malware variants
Malicious softwares (Malwares) are able to quickly evolve into many different variants and evade various existing detection techniques. Machine learning based techniques perform well in detecting malware variants, but in the real industry, the volume of malware variants grows fast and labelling data takes a lot of labour. Thus companies tend to label a small part of the malware samples and treat the rest of the unlabeled samples as benign samples, which leads to limited accuracy. To address such a problem, in this thesis, we propose a cost-sensitive boosting method to train a detection model with the malicious-unlabeled executables to improve the accuracy. Extensive experiments have demonstrated that the proposed method, when implemented into the machine learning algorithms (with positive and unlabeled datasets), has shown to improve the final results. It improved the reliability of the machine learning models, and during the training period, it improved the speed, convergence etc.