-
CiteScore
-
Impact Factor
Volume 2, Issue 3, Journal of Social Systems and Policy Analysis
Volume 2, Issue 3, 2025
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
Journal of Social Systems and Policy Analysis, Volume 2, Issue 3, 2025: 111-132

Free to Read | Research Article | 06 August 2025
Student Dropout Prediction Using Ensemble Learning with SHAP-Based Explainable AI Analysis
1 School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221116, China
2 School of Information Engineering, Minzu University of China, Beijing 100081, China
* Corresponding Author: Xiang Zhou, [email protected]
Received: 05 June 2025, Accepted: 18 June 2025, Published: 06 August 2025  
Abstract
Student dropout prediction is a critical challenge in higher education that requires accurate identification of at-risk students to enable timely interventions. This study presents EASE-Predict (Ensemble-SHAP Explainable Student Prediction), a comprehensive ensemble learning framework with SHAP-based explainable AI to predict student academic outcomes. We evaluated five machine learning algorithms (Random Forest, Gradient Boosting, Extra Trees, Logistic Regression, and SVM) and developed voting and stacking ensemble models on a dataset of 4,424 students with 36 features encompassing academic performance, socioeconomic factors, and demographic information.EASE-Predict achieved superior performance with 77.4% accuracy, representing a statistically significant improvement of 4.3 percentage points over the best individual model (Random Forest: 77.3%). The framework demonstrated exceptional class-specific discriminative performance with AUC scores of 0.930 for Graduate prediction (vs. 0.927 for best individual model), 0.821 for Enrolled students (vs. 0.794 for SVM), and 0.913 for Dropout identification (vs. 0.904 for individual models). Cross-validation results showed superior stability with the lowest performance variance (σ = 0.014 vs. σ = 0.0189 for Random Forest). SHAP explainability analysis quantified feature importance, revealing that second semester curricular units completion accounts for 60% of prediction influence, followed by tuition payment status (35%) and scholarship availability (12%).McNemar’s statistical tests confirmed that EASE-Predict’s performance improvements are statistically significant (p < 0.05) across all evaluation metrics.The framework maintains interpretability while achieving state-of-the-art accuracy, providing educational institutions with actionable insights for implementing evidence-based intervention strategies.

Graphical Abstract
Student Dropout Prediction Using Ensemble Learning with SHAP-Based Explainable AI Analysis

Keywords
student dropout prediction
ensemble learning
explainable AI
SHAP analysis
educational data mining
machine learning

Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Berens, J., Schneider, K., Görtz, S., Oster, S., & Burghoff, J. (2019). Early detection of students at risk--predicting student dropouts using administrative student data from German universities and machine learning methods. Journal of Educational Data Mining, 11(3), 1-41.
    [CrossRef]   [Google Scholar]
  2. Xu, X., Wang, J., Peng, H., & Wu, R. (2019). Prediction of academic performance associated with internet usage behaviors using machine learning algorithms. Computers in Human Behavior, 98, 166-173.
    [CrossRef]   [Google Scholar]
  3. Schneider, M., & Preckel, F. (2017). Variables associated with achievement in higher education: A systematic review of meta-analyses. Psychological Bulletin, 143(6), 565-600.
    [CrossRef]   [Google Scholar]
  4. Kumar, M., Singh, A. J., & Handa, D. (2017). Literature Survey on Student’s Performance Prediction in Education using Data Mining Techniques. International Journal of Education and Management Engineering, 7(6), 40.
    [CrossRef]   [Google Scholar]
  5. Rastrollo-Guerrero, J. L., Gómez-Pulido, J. A., & Durán-Domínguez, A. (2020). Analyzing and predicting students' performance by means of machine learning: A review. Applied Sciences, 10(3), 1042.
    [CrossRef]   [Google Scholar]
  6. Adejo, O. W., & Connolly, T. (2018). Predicting student academic performance using multi-model heterogeneous ensemble approach. Journal of Applied Research in Higher Education, 10(1), 61-75.
    [CrossRef]   [Google Scholar]
  7. Chen, T., Antoniou, G., Adamou, M., Tachmazidis, I., & Su, P. (2021). Automatic diagnosis of attention deficit hyperactivity disorder using machine learning. Applied Artificial Intelligence, 35(9), 657-669.
    [CrossRef]   [Google Scholar]
  8. Hellas, A., Ihantola, P., Petersen, A., Ajanovski, V. V., Gutica, M., Hynninen, T., ... & Liao, S. N. (2018, July). Predicting academic performance: a systematic literature review. In Proceedings companion of the 23rd annual ACM conference on innovation and technology in computer science education (pp. 175-199).
    [CrossRef]   [Google Scholar]
  9. Osuna-Rodríguez, M., Amor, M. I., & Dios, I. (2023). An Evaluation of University Students’ Perceptions of Gender Violence—A Study of Its Prevalence in Southern Spain. Education Sciences, 13(2), 178.
    [CrossRef]   [Google Scholar]
  10. Fernández-García, A. J., Rodríguez-Echeverría, R., Preciado, J. C., Conejero, J. M., & Sánchez-Figueroa, F. (2020). Creating a recommender system to support higher education students in the subject enrollment decision. IEEE Access, 8, 189069-189088.
    [CrossRef]   [Google Scholar]
  11. Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. WIREs Data Mining and Knowledge Discovery, 10(3), e1355.
    [CrossRef]   [Google Scholar]
  12. Dutt, A., Ismail, M. A., & Herawan, T. (2017). A systematic review on educational data mining. IEEE Access, 5, 15991-16005.
    [CrossRef]   [Google Scholar]
  13. Chung, J. Y., & Lee, S. (2019). Dropout early warning systems for high school students using machine learning. Children and Youth Services Review, 96, 346-353.
    [CrossRef]   [Google Scholar]
  14. Albreiki, B., Zaki, N., & Alashwal, H. (2021). A systematic literature review of student’performance prediction using machine learning techniques. Education Sciences, 11(9), 552.
    [CrossRef]   [Google Scholar]
  15. Christoph, M. (2020). Interpretable machine learning: A guide for making black box models explainable.
    [Google Scholar]
  16. Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., ... & Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82-115.
    [CrossRef]   [Google Scholar]
  17. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
    [CrossRef]   [Google Scholar]
  18. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). `` Why should i trust you?'' Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144).
    [CrossRef]   [Google Scholar]
  19. Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., ... & Lee, S. I. (2020). From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, 2(1), 56-67.
    [CrossRef]   [Google Scholar]
  20. Chen, H., Lundberg, S., & Lee, S. I. (2020). Explaining models by propagating Shapley values of local components. In Explainable AI in Healthcare and Medicine: Building a Culture of Transparency and Accountability (pp. 261-270). Cham: Springer International Publishing.
    [CrossRef]   [Google Scholar]
  21. Dietterich, T. G. (2000, June). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1-15). Berlin, Heidelberg: Springer Berlin Heidelberg.
    [CrossRef]   [Google Scholar]
  22. Dong, X., Yu, Z., Cao, W., Shi, Y., & Ma, Q. (2020). A survey on ensemble learning. Frontiers of Computer Science, 14, 241-258.
    [CrossRef]   [Google Scholar]
  23. Kuncheva, L. I. (2014). Combining pattern classifiers: methods and algorithms. John Wiley & Sons.
    [CrossRef]   [Google Scholar]
  24. Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M., & Suganthan, P. N. (2022). Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115, 105151.
    [CrossRef]   [Google Scholar]
  25. Sokkhey, P., & Okazaki, T. (2020). Hybrid machine learning algorithms for predicting academic performance. International Journal of Advanced Computer Science and Applications, 11(1), 32-41.
    [CrossRef]   [Google Scholar]
  26. Kostopoulos, G., Kotsiantis, S., & Pintelas, P. (2015, October). Estimating student dropout in distance higher education using semi-supervised techniques. In Proceedings of the 19th Panhellenic conference on informatics (pp. 38-43).
    [CrossRef]   [Google Scholar]
  27. Marcinkevičs, R., & Vogt, J. E. (2020). Interpretability and Explainability: A Machine Learning Zoo Mini-tour.
    [CrossRef]   [Google Scholar]
  28. Belle, V., & Papantonis, I. (2021). Principles and practice of explainable machine learning. Frontiers in big Data, 4, 688969.
    [CrossRef]   [Google Scholar]
  29. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5), 206-215.
    [CrossRef]   [Google Scholar]
  30. Murdoch, W. J., Singh, C., Kumbier, K., Abbasi-Asl, R., & Yu, B. (2019). Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44), 22071-22080.
    [CrossRef]   [Google Scholar]
  31. Dekker, G. W., Pechenizkiy, M., & Vleeshouwers, J. M. (2009). Predicting students drop out: A case study. In Proceedings of the 2nd International Conference on Educational Data Mining, EDM 2009, July 1-3, 2009. Cordoba, Spain (pp. 41-50).
    [Google Scholar]
  32. Baker, R. S., Martin, T., & Rossi, L. M. (2016). Educational data mining and learning analytics. The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications, 379-396.
    [CrossRef]   [Google Scholar]
  33. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
    [CrossRef]   [Google Scholar]
  34. Guo, B., Zhang, R., Xu, G., Shi, C., & Yang, L. (2015, July). Predicting students performance in educational data mining. In 2015 international symposium on educational technology (ISET) (pp. 125-128). IEEE.
    [CrossRef]   [Google Scholar]
  35. Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794).
    [CrossRef]   [Google Scholar]
  36. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, 30, 3146-3154.
    [CrossRef]   [Google Scholar]
  37. Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. Proceedings of 5th Annual Future Business Technology Conference, 5-12.
    [Google Scholar]
  38. Marbouti, F., Diefes-Dux, H. A., & Madhavan, K. (2016). Models for early prediction of at-risk students in a course using standards-based grading. Computers & Education, 103, 1-15.
    [CrossRef]   [Google Scholar]
  39. Waheed, H., Hassan, S. U., Aljohani, N. R., Hardman, J., Alelyani, S., & Nawaz, R. (2020). Predicting academic performance of students from VLE big data using deep learning models. Computers in Human Behavior, 104, 106189.
    [CrossRef]   [Google Scholar]
  40. Zhang, L., Xiong, X., Zhao, S., Botelho, A., & Heffernan, N. T. (2017). Incorporating rich features into deep knowledge tracing. Proceedings of the Fourth (2017) ACM Conference on Learning@ Scale, 169-172.
    [CrossRef]   [Google Scholar]
  41. Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and systems magazine, 6(3), 21-45.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Liu, Z., Zhou, X., & Liu, Y. (2025). Student Dropout Prediction Using Ensemble Learning with SHAP-Based Explainable AI Analysis. Journal of Social Systems and Policy Analysis, 2(3), 111–132. https://doi.org/10.62762/JSSPA.2025.321501

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 390
PDF Downloads: 98

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Journal of Social Systems and Policy Analysis

Journal of Social Systems and Policy Analysis

ISSN: 3068-5540 (Online)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/