Recursive feature addition: A novel feature selection technique, including a proof of concept in network security
Protecting the information on the Internet is a significant objective and this significance is increasing with time. The Internet experiences many attacks every day and that is attracting all participant parties to find a solution. One of the most common solutions to provide network security is called a Network Intrusion Detection System (NIDS). Usually a NIDS utilizes a classifier to classify the data extracted from the incoming traffic to either a normal or an attack connection. The feature selection phase is an early phase that needs to be carefully achieved prior to classification; otherwise the entire performance of the NIDS will tremendously deteriorate. However, two challenges face the NIDS that deals with new attacks is the availability of only a small number of examples with many features. The first challenge that NIDS experiences is the overfitting. Feature selection methods in turn face the second challenge of finding interdependent features. This thesis focuses on feature selection to address the above challenges that face NIDS. The contributions of this thesis involve proposing and implementing a new feature selection method which is called Recursive Feature Addition (RFA). The RFA depends on Support Vector Machines classifier and works in a forward recursive fashion in selecting the features. The RFA method has been tested on the synthetic data set and proved its ability to detect interdependent features, and tested on real-world high-dimensional data sets and proved its superiority over RFE in performance. Applying RFA on the ISCX 2012 data set is another contribution since RFA outperformed RFE in detecting intrusions. A new metric is also proposed to be used in evaluating NIDS applications that combines three well-known metrics. Furthermore, four ranking coefficients have been proposed to be used with RFA beside the original one, and have been tested with RFA on the ISCX data set. The statistical test confirmed the RFA's superiority over RFE using different performance metrics and on different data sets. Moreover, the work involved implementing the multi-class intrusion detection, where the NIDS identifies the specific attack type instead of raising an alarm only in case of an intrusion.