Student Dropout Prediction Using Ensemble Learning with SHAP-Based Explainable AI Analysis
Article Information
Abstract
Student dropout prediction is a critical challenge in higher education that requires accurate identification of at-risk students to enable timely interventions. This study presents EASE-Predict (Ensemble-SHAP Explainable Student Prediction), a comprehensive ensemble learning framework with SHAP-based explainable AI to predict student academic outcomes. We evaluated five machine learning algorithms (Random Forest, Gradient Boosting, Extra Trees, Logistic Regression, and SVM) and developed voting and stacking ensemble models on a dataset of 4,424 students with 36 features encompassing academic performance, socioeconomic factors, and demographic information.EASE-Predict achieved superior performance with 77.4% accuracy, representing a statistically significant improvement of 4.3 percentage points over the best individual model (Random Forest: 77.3%). The framework demonstrated exceptional class-specific discriminative performance with AUC scores of 0.930 for Graduate prediction (vs. 0.927 for best individual model), 0.821 for Enrolled students (vs. 0.794 for SVM), and 0.913 for Dropout identification (vs. 0.904 for individual models). Cross-validation results showed superior stability with the lowest performance variance (σ = 0.014 vs. σ = 0.0189 for Random Forest). SHAP explainability analysis quantified feature importance, revealing that second semester curricular units completion accounts for 60% of prediction influence, followed by tuition payment status (35%) and scholarship availability (12%).McNemar’s statistical tests confirmed that EASE-Predict’s performance improvements are statistically significant (p < 0.05) across all evaluation metrics.The framework maintains interpretability while achieving state-of-the-art accuracy, providing educational institutions with actionable insights for implementing evidence-based intervention strategies.
Graphical Abstract
Keywords
Data Availability Statement
Funding
Conflicts of Interest
Ethical Approval and Consent to Participate
References
- Berens, J., Schneider, K., Görtz, S., Oster, S., & Burghoff, J. (2019). Early detection of students at risk--predicting student dropouts using administrative student data from German universities and machine learning methods. Journal of Educational Data Mining, 11(3), 1-41.
[CrossRef] [Google Scholar] - Xu, X., Wang, J., Peng, H., & Wu, R. (2019). Prediction of academic performance associated with internet usage behaviors using machine learning algorithms. Computers in Human Behavior, 98, 166-173.
[CrossRef] [Google Scholar] - Schneider, M., & Preckel, F. (2017). Variables associated with achievement in higher education: A systematic review of meta-analyses. Psychological Bulletin, 143(6), 565-600.
[CrossRef] [Google Scholar] - Kumar, M., Singh, A. J., & Handa, D. (2017). Literature Survey on Student’s Performance Prediction in Education using Data Mining Techniques. International Journal of Education and Management Engineering, 7(6), 40.
[CrossRef] [Google Scholar] - Rastrollo-Guerrero, J. L., Gómez-Pulido, J. A., & Durán-Domínguez, A. (2020). Analyzing and predicting students' performance by means of machine learning: A review. Applied Sciences, 10(3), 1042.
[CrossRef] [Google Scholar] - Adejo, O. W., & Connolly, T. (2018). Predicting student academic performance using multi-model heterogeneous ensemble approach. Journal of Applied Research in Higher Education, 10(1), 61-75.
[CrossRef] [Google Scholar] - Chen, T., Antoniou, G., Adamou, M., Tachmazidis, I., & Su, P. (2021). Automatic diagnosis of attention deficit hyperactivity disorder using machine learning. Applied Artificial Intelligence, 35(9), 657-669.
[CrossRef] [Google Scholar] - Hellas, A., Ihantola, P., Petersen, A., Ajanovski, V. V., Gutica, M., Hynninen, T., ... & Liao, S. N. (2018, July). Predicting academic performance: a systematic literature review. In Proceedings companion of the 23rd annual ACM conference on innovation and technology in computer science education (pp. 175-199).
[CrossRef] [Google Scholar] - Osuna-Rodríguez, M., Amor, M. I., & Dios, I. (2023). An Evaluation of University Students’ Perceptions of Gender Violence—A Study of Its Prevalence in Southern Spain. Education Sciences, 13(2), 178.
[CrossRef] [Google Scholar] - Fernández-García, A. J., Rodríguez-Echeverría, R., Preciado, J. C., Conejero, J. M., & Sánchez-Figueroa, F. (2020). Creating a recommender system to support higher education students in the subject enrollment decision. IEEE Access, 8, 189069-189088.
[CrossRef] [Google Scholar] - Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. WIREs Data Mining and Knowledge Discovery, 10(3), e1355.
[CrossRef] [Google Scholar] - Dutt, A., Ismail, M. A., & Herawan, T. (2017). A systematic review on educational data mining. IEEE Access, 5, 15991-16005.
[CrossRef] [Google Scholar] - Chung, J. Y., & Lee, S. (2019). Dropout early warning systems for high school students using machine learning. Children and Youth Services Review, 96, 346-353.
[CrossRef] [Google Scholar] - Albreiki, B., Zaki, N., & Alashwal, H. (2021). A systematic literature review of student’performance prediction using machine learning techniques. Education Sciences, 11(9), 552.
[CrossRef] [Google Scholar] - Christoph, M. (2020). Interpretable machine learning: A guide for making black box models explainable.
[Google Scholar] - Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., ... & Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82-115.
[CrossRef] [Google Scholar] - Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
[CrossRef] [Google Scholar] - Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). `` Why should i trust you?'' Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144).
[CrossRef] [Google Scholar] - Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., ... & Lee, S. I. (2020). From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, 2(1), 56-67.
[CrossRef] [Google Scholar] - Chen, H., Lundberg, S., & Lee, S. I. (2020). Explaining models by propagating Shapley values of local components. In Explainable AI in Healthcare and Medicine: Building a Culture of Transparency and Accountability (pp. 261-270). Cham: Springer International Publishing.
[CrossRef] [Google Scholar] - Dietterich, T. G. (2000, June). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1-15). Berlin, Heidelberg: Springer Berlin Heidelberg.
[CrossRef] [Google Scholar] - Dong, X., Yu, Z., Cao, W., Shi, Y., & Ma, Q. (2020). A survey on ensemble learning. Frontiers of Computer Science, 14, 241-258.
[CrossRef] [Google Scholar] - Kuncheva, L. I. (2014). Combining pattern classifiers: methods and algorithms. John Wiley & Sons.
[CrossRef] [Google Scholar] - Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M., & Suganthan, P. N. (2022). Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115, 105151.
[CrossRef] [Google Scholar] - Sokkhey, P., & Okazaki, T. (2020). Hybrid machine learning algorithms for predicting academic performance. International Journal of Advanced Computer Science and Applications, 11(1), 32-41.
[CrossRef] [Google Scholar] - Kostopoulos, G., Kotsiantis, S., & Pintelas, P. (2015, October). Estimating student dropout in distance higher education using semi-supervised techniques. In Proceedings of the 19th Panhellenic conference on informatics (pp. 38-43).
[CrossRef] [Google Scholar] - Marcinkevičs, R., & Vogt, J. E. (2020). Interpretability and Explainability: A Machine Learning Zoo Mini-tour.
[CrossRef] [Google Scholar] - Belle, V., & Papantonis, I. (2021). Principles and practice of explainable machine learning. Frontiers in big Data, 4, 688969.
[CrossRef] [Google Scholar] - Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5), 206-215.
[CrossRef] [Google Scholar] - Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R., & Yu, B. (2019). Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44), 22071-22080.
[CrossRef] [Google Scholar] - Dekker, G. W., Pechenizkiy, M., & Vleeshouwers, J. M. (2009). Predicting students drop out: A case study. In Proceedings of the 2nd International Conference on Educational Data Mining, EDM 2009, July 1-3, 2009. Cordoba, Spain (pp. 41-50).
[Google Scholar] - Baker, R. S., Martin, T., & Rossi, L. M. (2016). Educational data mining and learning analytics. The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications, 379-396.
[CrossRef] [Google Scholar] - Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
[CrossRef] [Google Scholar] - Guo, B., Zhang, R., Xu, G., Shi, C., & Yang, L. (2015, July). Predicting students performance in educational data mining. In 2015 international symposium on educational technology (ISET) (pp. 125-128). IEEE.
[CrossRef] [Google Scholar] - Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
[CrossRef] [Google Scholar] - Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146-3154.
[CrossRef] [Google Scholar] - Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. Proceedings of 5th Annual Future Business Technology Conference, 5-12.
[Google Scholar] - Marbouti, F., Diefes-Dux, H. A., & Madhavan, K. (2016). Models for early prediction of at-risk students in a course using standards-based grading. Computers & Education, 103, 1-15.
[CrossRef] [Google Scholar] - Waheed, H., Hassan, S. U., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2020). Predicting academic performance of students from VLE big data using deep learning models. Computers in Human Behavior, 104, 106189.
[CrossRef] [Google Scholar] - Zhang, L., Xiong, X., Zhao, S., Botelho, A., & Heffernan, N. T. (2017). Incorporating rich features into deep knowledge tracing. Proceedings of the Fourth (2017) ACM Conference on Learning@ Scale, 169-172.
[CrossRef] [Google Scholar] - Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and systems magazine, 6(3), 21-45.
[CrossRef] [Google Scholar]
Cited By (4)
-
Jin Baek Kwon. A Portable, Generalizable Machine Learning Framework for Long-Term Student Dropout Prediction.
IEEE Access, 2026 , 14 .
[CrossRef] -
Emrah Arslan, Silvia Gaftandzhieva, Ali Gorgani Firouzjaei, Javad Hassannataj Joloudari, Rositsa Doneva. Ex-ADA: a SHAP-based explainable AdaBoost framework for predicting at-risk students.
Frontiers in Education, 2026 , 10 .
[CrossRef] -
Nurul Hidayat, Lasmedi Afuan, Helmi Roichatul Jannah. Prescriptive Learning Analytics for Student Dropout: Integrating Temporal Velocity and Counterfactual Explanations in Longitudinal Data.
Journal of Computing Theories and Applications, 2026 , 3 (4).
[CrossRef] -
Abdelkarim Bettahi, Hamid Harroud, Fatima-Zahra Belouadha. Early Student Risk Detection Using CR-NODE: A Completion-Focused Temporal Approach with Explainable AI.
Algorithms, 2025 , 18 (12).
[CrossRef]
Cite This Article
TY - JOUR AU - Liu, Ziyang AU - Zhou, Xiang AU - Liu, Yijun PY - 2025 DA - 2025/08/06 TI - Student Dropout Prediction Using Ensemble Learning with SHAP-Based Explainable AI Analysis JO - Journal of Social Systems and Policy Analysis T2 - Journal of Social Systems and Policy Analysis JF - Journal of Social Systems and Policy Analysis VL - 2 IS - 3 SP - 111 EP - 132 DO - 10.62762/JSSPA.2025.321501 UR - https://www.icck.org/article/abs/JSSPA.2025.321501 KW - student dropout prediction KW - ensemble learning KW - explainable AI KW - SHAP analysis KW - educational data mining KW - machine learning AB - Student dropout prediction is a critical challenge in higher education that requires accurate identification of at-risk students to enable timely interventions. This study presents EASE-Predict (Ensemble-SHAP Explainable Student Prediction), a comprehensive ensemble learning framework with SHAP-based explainable AI to predict student academic outcomes. We evaluated five machine learning algorithms (Random Forest, Gradient Boosting, Extra Trees, Logistic Regression, and SVM) and developed voting and stacking ensemble models on a dataset of 4,424 students with 36 features encompassing academic performance, socioeconomic factors, and demographic information.EASE-Predict achieved superior performance with 77.4% accuracy, representing a statistically significant improvement of 4.3 percentage points over the best individual model (Random Forest: 77.3%). The framework demonstrated exceptional class-specific discriminative performance with AUC scores of 0.930 for Graduate prediction (vs. 0.927 for best individual model), 0.821 for Enrolled students (vs. 0.794 for SVM), and 0.913 for Dropout identification (vs. 0.904 for individual models). Cross-validation results showed superior stability with the lowest performance variance (σ = 0.014 vs. σ = 0.0189 for Random Forest). SHAP explainability analysis quantified feature importance, revealing that second semester curricular units completion accounts for 60% of prediction influence, followed by tuition payment status (35%) and scholarship availability (12%).McNemar’s statistical tests confirmed that EASE-Predict’s performance improvements are statistically significant (p < 0.05) across all evaluation metrics.The framework maintains interpretability while achieving state-of-the-art accuracy, providing educational institutions with actionable insights for implementing evidence-based intervention strategies. SN - 3068-5540 PB - Institute of Central Computation and Knowledge LA - English ER -
@article{Liu2025Student,
author = {Ziyang Liu and Xiang Zhou and Yijun Liu},
title = {Student Dropout Prediction Using Ensemble Learning with SHAP-Based Explainable AI Analysis},
journal = {Journal of Social Systems and Policy Analysis},
year = {2025},
volume = {2},
number = {3},
pages = {111-132},
doi = {10.62762/JSSPA.2025.321501},
url = {https://www.icck.org/article/abs/JSSPA.2025.321501},
abstract = {Student dropout prediction is a critical challenge in higher education that requires accurate identification of at-risk students to enable timely interventions. This study presents EASE-Predict (Ensemble-SHAP Explainable Student Prediction), a comprehensive ensemble learning framework with SHAP-based explainable AI to predict student academic outcomes. We evaluated five machine learning algorithms (Random Forest, Gradient Boosting, Extra Trees, Logistic Regression, and SVM) and developed voting and stacking ensemble models on a dataset of 4,424 students with 36 features encompassing academic performance, socioeconomic factors, and demographic information.EASE-Predict achieved superior performance with 77.4\% accuracy, representing a statistically significant improvement of 4.3 percentage points over the best individual model (Random Forest: 77.3\%). The framework demonstrated exceptional class-specific discriminative performance with AUC scores of 0.930 for Graduate prediction (vs. 0.927 for best individual model), 0.821 for Enrolled students (vs. 0.794 for SVM), and 0.913 for Dropout identification (vs. 0.904 for individual models). Cross-validation results showed superior stability with the lowest performance variance (σ = 0.014 vs. σ = 0.0189 for Random Forest). SHAP explainability analysis quantified feature importance, revealing that second semester curricular units completion accounts for 60\% of prediction influence, followed by tuition payment status (35\%) and scholarship availability (12\%).McNemar’s statistical tests confirmed that EASE-Predict’s performance improvements are statistically significant (p < 0.05) across all evaluation metrics.The framework maintains interpretability while achieving state-of-the-art accuracy, providing educational institutions with actionable insights for implementing evidence-based intervention strategies.},
keywords = {student dropout prediction, ensemble learning, explainable AI, SHAP analysis, educational data mining, machine learning},
issn = {3068-5540},
publisher = {Institute of Central Computation and Knowledge}
}
Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and Permissions
Copyright © 2025 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
Portico