Uncertainty guided oversampling: A classifier-specific approach for effective handling of class imbalance in machine learning

Sağlam, Fatih; Cengiz, Mehmet

doi:10.1016/j.knosys.2026.115886

Uncertainty guided oversampling: A classifier-specific approach for effective handling of class imbalance in machine learning

Sağlam F., Cengiz M. A.

KNOWLEDGE-BASED SYSTEMS, cilt.342, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 342
Basım Tarihi: 2026
Doi Numarası: 10.1016/j.knosys.2026.115886
Dergi Adı: KNOWLEDGE-BASED SYSTEMS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Library, Information Science & Technology Abstracts (LISTA)
Ondokuz Mayıs Üniversitesi Adresli: Evet

Özet

This study addresses class imbalance in machine learning classification tasks, where data distribution is unequal across classes. Existing resampling methods, which balance classes through random oversampling or under-sampling, can introduce noise or information loss. To overcome these limitations, we propose the Uncertainty Guided Oversampling (UGO) method. UGO leverages active learning principles to select informative data points for synthetic data generation. UGO employs a modified pool-based sampling to achieve automatic informative sample selection. Firstly, UGO identifies noise samples that have high aleatoric uncertainty before data generation. It then focuses on non-noisy data that have high epistemic uncertainty to ensure informative sample selection. UGO determines the amount of data generation by comparing the average epistemic uncertainty between classes, eliminating the need for predefined oversampling ratio. Furthermore, UGO prevents new noise generation by relocating newly generated noisy data. The iterative process continues until class epistemic uncertainties are statistically not different. The proposed UGO method is evaluated on 2 simulation and 74 benchmark datasets and is compared with other resampling techniques. UGO achieves statistically significant improvement based on the imbalanced accuracy measure (IAM), Matthew's correlation coefficient (MCC) and F1. Overall, UGO offers a classifier-specific approach to handling class imbalance by strategically selecting informative samples based on epistemic uncertainty for synthetic data generation and preventing noise introduction that can measure the imbalance without being dependent on the number of samples.