24th Signal Processing and Communication Application Conference (SIU), Zonguldak, Turkey, 16 - 19 May 2016, pp.1777-1780
In recent years, text classification have been widely used. Dimension of text data has increased more and more. Working of almost all classification algorithms is directly related to dimension. In high dimension data set, working of classification algorithms both takes time and occurs over fitting problem. So feature selection is crucial for machine learning techniques. In this study, frequently used feature selection metrics Chi Square (CHI), Information Gain (IG) and Odds Ratio (OR) have been applied. At the same time the method Relevancy Frequency (RF) proposed as term weighting method has been used as feature selection method in this study. It is used for tf.idf term as weighting method, Sequential Minimal Optimization (SMO) and Naive Bayes (NB) in the classification algorithm. Experimental results show that RF gives successful results.