ELECTRONICS, cilt.15, sa.4, 2026 (SCI-Expanded, Scopus)
The proliferation of unsolicited short messages (SMS spam) poses persistent challenges to mobile communication security and user privacy. This study presents a systematic benchmarking and analytical investigation of classical machine learning approaches for SMS spam detection, focusing on the impact of text feature representation under imbalanced short-text conditions.In practical SMS filtering systems, minimizing false positives (i.e., incorrectly blocking legitimate messages) is a critical operational constraint. Therefore, beyond overall accuracy, precision and specificity are emphasized to ensure reliable preservation of legitimate communication. Using the SMSSpamCollection dataset (5574 messages: 747 spam and 4827 ham), seven feature representation techniques were evaluated in combination with six widely adopted classifiers, resulting in 42 configurations assessed under 10-fold cross-validation. The results demonstrate that feature representation plays a more critical role than classifier complexity. Character-level 3-grams combined with Logistic Regression achieved the best overall performance, reaching 98.55% accuracy, with 98.55% precision and 90.50% recall for the spam class (F1-score = 94.32%), and 0.9893 AUC. Linear SVM produced comparable results, highlighting the effectiveness of linear models when paired with expressive representations. Beyond reporting performance metrics, this study analyzes feature-classifier interaction patterns and clarifies practical trade-offs between precision, recall, and computational efficiency. The findings provide reproducible baselines and structured guidance for designing efficient SMS spam filtering systems.