Feature selection and weighting for sentiment analysis
Sentiment Analysis is a sub-field of natural language processing and involves automatically classifying a piece of text according to the positive or negative opinions expressed in that text. Two main challenges related to sentiment analysis are identifying the best words, or features, on which to base classification decisions and correctly weighting the contribution of each feature to the sentiment expressed in the text. In this thesis we address these two challenges. We propose a new feature selection method, which automatically identifies features from training examples, and compare it with three other feature selection methods which have been shown to work well in previous research. We also propose a method to weight the importance of features based on their part-of-speech categories. Our experimental results show that the feature selection methods along with our part-of-speech feature weighting method can help improve the performance of sentiment analysis.