Novel Regularization Methods to Prevent Overfitting in Machine Learning Models

  • Authors

    • Dr. Rakesh Department of AI & Smart Systems, AKJ College of Arts and Science, Coimbatore, India. Author
    • Dr. Ananya Department of AI & Smart Systems, AKJ College of Arts and Science, Coimbatore, India. Author

    Published 2026-01-02

  • Overfitting, Regularization, Machine Learning, Generalization, Deep Learning, Model Robustness

    Issue

    Section

    Articles

    How to Cite

    [1]
    Rakesh and Ananya, “Novel Regularization Methods to Prevent Overfitting in Machine Learning Models”, IJMLPA, vol. 1, no. 1, pp. 01–14, Jan. 2026, Accessed: Mar. 02, 2026. [Online]. Available: https://worldcometresearchgroup.com/index.php/ijmlpa/article/view/60
  • Abstract

    The problem of overfitting is one of the most persistent in the current machine learning (ML), especially as models and data dimensionality increases. Although classical regularization methods like L1, L2, dropout, and early stopping have been proven to be efficient, their weakness is realized in large-scale, deep, and data-sparse learning settings. In the current paper, a detailed study has been carried out on new regularization techniques that aim to enhance the performance of generalization and, at the same time, ensure the expressiveness of the model. We present a single taxonomy of the new regularization methods such as adaptive regularization, information-theoretic constraints, structured sparsity, stochastic regularization and regularization at the representation level. Moreover, we suggest Hybrid Adaptive Information Regularization (HAIR) which is a dynamic complex/generalization balance which is regularized by entropy-based penalties and parameter-sensitivity analysis. Numerous comparative studies show that the suggested approach is more effective than the traditional approaches in various learning paradigms. The findings have emphasized the importance of advanced regularization in developing robust, scalable and interpretable ML systems. The current study provides a certain contribution to both theoretical background and methodological developments as well as empirical findings in favor of next-generation regularization approaches.

  • References

    [1] Tikhonov, A. N., & Arsenin, V. Y. (1977). Solutions of Ill-Posed Problems. Winston & Sons.

    — Foundational work introducing parameter penalization concepts underlying L2 regularization.

    [2] Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.

    — Classical statistical formulation of L2 regularization.

    [3] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B, 58(1), 267–288.

    — Seminal paper introducing L1 regularization and sparsity induction.

    [4] Ng, A. Y. (2004). Feature selection, L1 vs. L2 regularization, and rotational invariance. Proceedings of ICML.

    — Comparative analysis of L1 and L2 regularization properties.

    [5] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.

    — Foundational work on stochastic regularization via dropout.

    [6] Baldi, P., & Sadowski, P. (2013). Understanding dropout. Advances in Neural Information Processing Systems.

    — Theoretical insights into the behavior of dropout regularization.

    [7] Hinton, G. E., et al. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.

    — Early technical report motivating stochastic regularization.

    [8] Tishby, N., Pereira, F. C., & Bialek, W. (2000). The information bottleneck method. arXiv preprint physics/0004057.

    — Foundational paper on information-theoretic regularization.

    [9] Alemi, A. A., Fischer, I., Dillon, J. V., & Murphy, K. (2017). Deep variational information bottleneck. International Conference on Learning Representations (ICLR).

    — Practical extension of information bottleneck principles to deep learning.

    [10] Achille, A., & Soatto, S. (2018). Information dropout: Learning optimal representations through noisy computation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12), 2897–2905.

    — Combines stochastic and information-theoretic regularization.

    [11] Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B, 68(1), 49–67.

    — Introduces group sparsity concepts.

    [12] Neyshabur, B., Tomioka, R., & Srebro, N. (2015). Norm-based capacity control in neural networks. Proceedings of COLT.

    — Path-norm and structural regularization analysis.

    [13] Wen, W., et al. (2016). Learning structured sparsity in deep neural networks. Advances in Neural Information Processing Systems.

    — Structured regularization in deep architectures.

    [14] Loshchilov, I., & Hutter, F. (2019). Decoupled weight decay regularization. International Conference on Learning Representations (ICLR).

    — Adaptive regularization improvements over classical weight decay.

    [15] Hochreiter, S., & Schmidhuber, J. (1997). Flat minima. Neural Computation, 9(1), 1–42.

    — Early theoretical work linking flatness, generalization, and regularization.

  • Downloads