Robust Machine Learning Models for Imbalanced Dataset Classification
-
Published 2026-01-07
Imbalanced Datasets, Robust Classification, Cost-Sensitive Learning, Ensemble Learning, Resampling Techniques, Minority Class Prediction, Machine Learning Evaluation Metrics Issue
Section
ArticlesHow to Cite
[1]V. Kumar and G. Rajan, “Robust Machine Learning Models for Imbalanced Dataset Classification”, IJMLPA, vol. 1, no. 1, pp. 41–53, Jan. 2026, Accessed: Mar. 02, 2026. [Online]. Available: https://worldcometresearchgroup.com/index.php/ijmlpa/article/view/63Abstract
The problem of class imbalance in machine learning classification is widely present and difficult across the machine learning area, especially in real-world tasks, including fraud detection, medical diagnosing, network intrusion detection and fault prediction. When this occurs, the minority population is more likely to capture the important occurrences and the conventional machine learning models normally focus on the majority population and give misleading accuracy with poor generalization and high costs of misclassification. This paper is the result of an extensive research into powerful machine learning techniques in the classification of imbalanced datasets. The paper presents a systematic review of theoretical underpinnings of learning imbalance, literature reviews on state-of-the-art methods, such as data, algorithm-level and ensemble based methods, and suggests a convergent system methodology to build a robust classifier. Linear resampling algorithms, cost-effective learning algorithms, hybrid ensemble algorithms, and imbalanced data evaluation metrics are discussed in details. An organized experimental procedure is described to measure robustness when imbalance ratios and various noise levels are changing. Comparative findings indicate that hybrid methods that combine adaptive resampling and cost sensitive loss functions are always better than simpler classifiers based on their F1-score, G-mean, and area under the precision-recall curve. The discussion demonstrates practical trade-offs between model performance, model complexity and interpretability. In the conclusion part, the paper highlights future research directions which include scalable imbalance learning, deep learning adaptations and domain aware evaluation strategies. The paper is an excellent source of information to a researcher and practitioner aiming at finding principled and effective solutions to imbalanced classification problems.
References
[1] He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
[2] Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5), 429–449.
[3] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321–357.
[4] Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations, 6(1), 20–29.
[5] Weiss, G. M., & Provost, F. (2003). Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19, 315–354.
[6] Elkan, C. (2001). The foundations of cost-sensitive learning. In Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI), 973–978.
[7] Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by cost-proportionate example weighting. In Proceedings of ICDM, 435–442.
[8] Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.
[9] Drummond, C., & Holte, R. C. (2003). C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Proceedings of ICML Workshop on Learning from Imbalanced Datasets.
[10] Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
[11] Zhou, Z.-H., & Liu, X.-Y. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18(1), 63–77.
[12] Sun, Y., Kamel, M. S., Wong, A. K. C., & Wang, Y. (2007). Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 40(12), 3358–3378.
[13] Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
[14] Galar, M., Fernández, A., Barrenechea, E., Bustince, H., & Herrera, F. (2012). A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man, and Cybernetics, Part C, 42(4), 463–484.
[15] Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data sets. Springer, Berlin, Heidelberg.
Downloads
- ga
How to Cite
[1]V. Kumar and G. Rajan, “Robust Machine Learning Models for Imbalanced Dataset Classification”, IJMLPA, vol. 1, no. 1, pp. 41–53, Jan. 2026, Accessed: Mar. 02, 2026. [Online]. Available: https://worldcometresearchgroup.com/index.php/ijmlpa/article/view/63