A Survey on Feature Selection Techniques for Predictive Analytics

  • Authors

    • Dr. Nesca Mthethwa Department of Machine Learning & AI, University of Cape Town, South Africa. Author
    • Thane Nkosi Department of Machine Learning & AI, University of Cape Town, South Africa. Author

    Published 2026-01-04

  • Feature Selection, Predictive Analytics, Dimensionality Reduction, Machine Learning, Data Mining, Classification, Regression, High-Dimensional Data

    Issue

    Section

    Articles

    How to Cite

    [1]
    N. Mthethwa and T. Nkosi, “A Survey on Feature Selection Techniques for Predictive Analytics”, IJMLPA, vol. 1, no. 1, pp. 15–26, Jan. 2026, Accessed: Mar. 02, 2026. [Online]. Available: https://worldcometresearchgroup.com/index.php/ijmlpa/article/view/61
  • Abstract

    The feature selection is a crucial step in predictive analytics to determine which subset of features makes the most contribution to the high-dimensional data and remove irrelevant, redundant, or noisy features. The dimensionality of datasets keeps on growing, and, as contemporary data-driven applications produce large volumes of heterogeneous data, overfitting, computational complexity, worse model interpretability, and poorer generalization become issues as heterogeneous data increases. The feature selection methods are meant to address such challenges by improving predictive accuracy, minimizing training time and improving model robustness. This survey is a systematic and extensive overview of feature selection methods used in predictive analytics which are utilized in a variety of areas and fields, including healthcare, finance, bioinformatics, cybersecurity, and smart systems. In the paper, the features selection techniques have been classified as filter, wrapper, embedded, and hybrid techniques which give a comprehensive theoretical background of each of the techniques as well as a comparison of each of the techniques. Statistical, information-theoretic, similarity-based, and probabilistic filters are discussed in addition to the heuristic and metaheuristic wrapper methods, i.e. evolutionary, swarm-based etc. Also critically analyzed is embedded techniques that make use of regularization, decision trees, and ensemble learning. Moreover, this survey talks about the evaluation metrics, benchmark data, and design considerations of the experiment which are used in the evaluation of the effectiveness of the feature selection. Such practice issues as scalability, stability, data imbalance, and interpretability are mentioned, as well as new directions related to deep learning-based feature selection and multi-objective optimization and explainable artificial intelligence. This piece of work can be regarded as a useful source of information by the researcher and practitioners who want to develop effective, precise, and understandable predictive analytics systems.

  • References

    [1] I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003.

    [2] H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Boston, MA, USA: Springer, 1998.

    [3] L. Yu and H. Liu, “Feature selection for high-dimensional data: A fast correlation-based filter solution,” in Proc. 20th Int. Conf. Machine Learning (ICML), Washington, DC, USA, 2003, pp. 856–863.

    [4] C. M. Bishop, Pattern Recognition and Machine Learning. New York, NY, USA: Springer, 2006.

    [5] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. Hoboken, NJ, USA: Wiley, 2006.

    [6] F. Fleuret, “Fast binary feature selection with conditional mutual information,” Journal of Machine Learning Research, vol. 5, pp. 1531–1555, 2004.

    [7] R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artificial Intelligence, vol. 97, no. 1–2, pp. 273–324, 1997.

    [8] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proc. IEEE Int. Conf. Neural Networks, Perth, WA, Australia, 1995, pp. 1942–1948.

    [9] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Reading, MA, USA: Addison-Wesley, 1989.

    [10] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, 2nd ed. New York, NY, USA: Springer, 2009.

    [11] R. Tibshirani, “Regression shrinkage and selection via the Lasso,” Journal of the Royal Statistical Society: Series B, vol. 58, no. 1, pp. 267–288, 1996.

    [12] J. Friedman, T. Hastie, and R. Tibshirani, “Regularization paths for generalized linear models via coordinate descent,” Journal of Statistical Software, vol. 33, no. 1, pp. 1–22, 2010.

    [13] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.

    [14] M. Dash and H. Liu, “Feature selection for classification,” Intelligent Data Analysis, vol. 1, no. 1–4, pp. 131–156, 1997.

    [15] A. Jain and D. Zongker, “Feature selection: Evaluation, application, and small sample performance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 2, pp. 153–158, Feb. 1997.

  • Downloads