Feature Selection in Text Classification

Şahin D. Ö., Ateş N., Kılıç E.

24th Signal Processing and Communication Application Conference (SIU), Zonguldak, Turkey, 16 - 19 May 2016, pp.1777-1780 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/siu.2016.7496105
  • City: Zonguldak
  • Country: Turkey
  • Page Numbers: pp.1777-1780
  • Keywords: text classification, feature selection, term weighting
  • Ondokuz Mayıs University Affiliated: Yes


In recent years, text classification have been widely used. Dimension of text data has increased more and more. Working of almost all classification algorithms is directly related to dimension. In high dimension data set, working of classification algorithms both takes time and occurs over fitting problem. So feature selection is crucial for machine learning techniques. In this study, frequently used feature selection metrics Chi Square (CHI), Information Gain (IG) and Odds Ratio (OR) have been applied. At the same time the method Relevancy Frequency (RF) proposed as term weighting method has been used as feature selection method in this study. It is used for tf.idf term as weighting method, Sequential Minimal Optimization (SMO) and Naive Bayes (NB) in the classification algorithm. Experimental results show that RF gives successful results.