KNOWLEDGE-BASED SYSTEMS, cilt.342, 2026 (SCI-Expanded, Scopus)
This study addresses class imbalance in machine learning classification tasks, where data distribution is unequal across classes. Existing resampling methods, which balance classes through random oversampling or under-sampling, can introduce noise or information loss. To overcome these limitations, we propose the Uncertainty Guided Oversampling (UGO) method. UGO leverages active learning principles to select informative data points for synthetic data generation. UGO employs a modified pool-based sampling to achieve automatic informative sample selection. Firstly, UGO identifies noise samples that have high aleatoric uncertainty before data generation. It then focuses on non-noisy data that have high epistemic uncertainty to ensure informative sample selection. UGO determines the amount of data generation by comparing the average epistemic uncertainty between classes, eliminating the need for predefined oversampling ratio. Furthermore, UGO prevents new noise generation by relocating newly generated noisy data. The iterative process continues until class epistemic uncertainties are statistically not different. The proposed UGO method is evaluated on 2 simulation and 74 benchmark datasets and is compared with other resampling techniques. UGO achieves statistically significant improvement based on the imbalanced accuracy measure (IAM), Matthew's correlation coefficient (MCC) and F1. Overall, UGO offers a classifier-specific approach to handling class imbalance by strategically selecting informative samples based on epistemic uncertainty for synthetic data generation and preventing noise introduction that can measure the imbalance without being dependent on the number of samples.