On Improved Generalization of 5-State Hidden Markov Model-based Internet Traffic Classifiers
The multitude of services delivered over the Internet would have been difficult to fathom 40 years ago when much of the initial design was being undertaken. As a consequence, the resulting architecture did not make provisions for differentiating between, and managing the potentially conflicting requirements of different types of services such as real-time voice communication and peer-to-peer file sharing. This shortcoming has resulted in a situation whereby services with conflicting requirements often interfere with each other and ultimately decrease the effectiveness of the Internet as an enabler of new and transformative services. The ability to passively identify different types of Internet traffic then would address this shortcoming and enable effective management of conflicting types of services, in addition to facilitating a better understanding of how the Internet is used in general. Recent attempts at developing such techniques have shown promising results in simulation environments but perform considerably worse when deployed into real-world scenarios. One possible reason for this descrepancy can be attributed to the implicit assumption shared by recent approaches regarding the degree of similarity between the many networks which comprise the Internet. This thesis quantifies the degradation in performance which can be expected when such an assumption is violated as well as demonstrating alternative classification techniques which are less sensitive to such violations.