-
CiteScore
-
Impact Factor
Volume 1, Issue 1, ICCK Transactions on Educational Data Mining
Volume 1, Issue 1, 2025
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Transactions on Educational Data Mining, Volume 1, Issue 1, 2025: 36-43

Research Article | 25 December 2025
Enhancing Student Dropout and Academic Success Prediction Using Machine Learning and Over-sampling Techniques
1 Department of Civil and Industrial Engineering (DICI), University of Pisa, Pisa 56122, Italy
2 International Doctorate in Civil and Environmental Engineering, University of Florence, Florence 50139, Italy
3 Laboratory of Accident Mechanism Analysis (LMA), Université Gustave Eiffel, Salon-de-Provence 13300, France
4 Xiamen Institute of Technology, Xiamen 361024, China
* Corresponding Author: Chenxi Wang, [email protected]
ARK: ark:/57805/tedm.2025.732573
Received: 29 November 2025, Accepted: 17 December 2025, Published: 25 December 2025  
Abstract
Predicting student dropout and academic success is important for higher education institutions for enhancing retention and deliver timely interventions. However, educational datasets often exhibit severe class imbalance, particularly when multiple academic outcomes (i.e., dropout, enrolled, and graduate) are considered simultaneously. Thus this study examines the effectiveness of three widely used over-sampling techniques (i.e., RandomOverSampler, synthetic minority oversampling technique, and adaptive synthetic sampling) for mitigating class imbalance and enhancing prediction performance. These sampling strategies are evaluated in combination with several machine learning classifiers to assess their influence on accuracy of minority-class detection. The experimental results show that appropriate over-sampling substantially improves model performance, especially for the minority categories. The findings highlight the critical role of imbalance-handling techniques in educational data mining and offer practical insights for institutions seeking to build robust early-warning systems.

Keywords
student dropout prediction
over-sampling
SMOTE
ADASYN
academic success

Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Rahmani, A. M., Groot, W., & Rahmani, H. (2024). Dropout in online higher education: A systematic literature review. International Journal of Educational Technology in Higher Education, 21(1), 19.
    [CrossRef]   [Google Scholar]
  2. Nurmalitasari, Awang Long, Z., & Faizuddin Mohd Noor, M. (2023). Factors influencing dropout students in higher education. Education Research International, 2023(1), 7704142.
    [CrossRef]   [Google Scholar]
  3. Naseer, F., Khan, M. N., Tahir, M., Addas, A., & Aejaz, S. M. H. (2024). Integrating deep learning techniques for personalized learning pathways in higher education. Heliyon, 10(11).
    [CrossRef]   [Google Scholar]
  4. Rabelo, A., Rodrigues, M. W., Nobre, C., Isotani, S., & Zárate, L. (2024). Educational data mining and learning analytics: A review of educational management in e-learning. Information Discovery and Delivery, 52(2), 149-163.
    [CrossRef]   [Google Scholar]
  5. Fan, Z., Gou, J., & Weng, S. (2025). Complementary CatBoost based on residual error for student performance prediction. Pattern Recognition, 161, 111265.
    [CrossRef]   [Google Scholar]
  6. Shiao, Y.-T., Chen, C.-H., Wu, K.-F., Chen, B.-L., Chou, Y.-H., & Wu, T.-N. (2023). Reducing dropout rate through a deep learning model for sustainable education: long-term tracking of learning outcomes of an undergraduate cohort from 2018 to 2021. Smart Learning Environments, 10(1), 55.
    [CrossRef]   [Google Scholar]
  7. Jeon, B., Park, N., & Bang, S. (2020). Dropout prediction over weeks in MOOCs via interpretable multi-layer representation learning. arXiv preprint arXiv:2002.01598.
    [Google Scholar]
  8. Jin, C. (2023). MOOC student dropout prediction model based on learning behavior features and parameter optimization. Interactive Learning Environments, 31(2), 714–732.
    [CrossRef]   [Google Scholar]
  9. Zanellati, A., Zingaro, S. P., & Gabbrielli, M. (2024). Balancing performance and explainability in academic dropout prediction. IEEE Transactions on Learning Technologies, 17, 2086–2099.
    [CrossRef]   [Google Scholar]
  10. Carballo-Mendívil, B., Arellano-González, A., Ríos-Vázquez, N. J., & Lizardi-Duarte, M. d. P. (2025). Predicting student dropout from day one: XGBoost-based early warning system using pre-enrollment data. Applied Sciences, 15(16), 9202.
    [CrossRef]   [Google Scholar]
  11. Masood, S. W., Gogoi, M., & Begum, S. A. (2025). Optimised SMOTE-based imbalanced learning for student dropout prediction. Arabian Journal for Science and Engineering, 50(10), 7165–7179.
    [CrossRef]   [Google Scholar]
  12. Alshamaila, Y., Alsawalqah, H., Aljarah, I., Habib, M., Faris, H., Alshraideh, M., & Salih, B. A. (2024). An automatic prediction of students' performance to support the university education system: a deep learning approach. Multimedia Tools and Applications, 83(15), 46369–46396.
    [CrossRef]   [Google Scholar]
  13. Osman, F. N., Aziz, M. A. A., & Taib, M. N. (2024). Enhancing Students' Academic Performance Classifier using ADASYN and MLP. In 2024 IEEE 22nd Student Conference on Research and Development (SCOReD) (pp. 221–226). IEEE.
    [CrossRef]   [Google Scholar]
  14. Nhita, F., & Kurniawan, I. (2023). Performance and Statistical Evaluation of Three Sampling Approaches in Handling Binary Imbalanced Data Sets. In 2023 International Conference on Data Science and Its Applications (ICoDSA) (pp. 420–425). IEEE.
    [CrossRef]   [Google Scholar]
  15. Dablain, D., Krawczyk, B., & Chawla, N. V. (2022). DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 34(9), 6390–6404.
    [CrossRef]   [Google Scholar]
  16. Pramanik, S., Bag, S., Roy, A., Ghosh, R., & Rakshit, P. (2025). SMOTE vs. ADASYN: An Analysis of Data Balancing Techniques to Enhance Machine Learning-Based Bank Loans Standard Hazard Forecasts. In Reshaping the Economy With AI (pp. 93-110). IGI Global Scientific Publishing. 10.4018/979-8-3693-8714-6.ch004
    [Google Scholar]
  17. He, K., Zhang, X., Ren, S., & Sun, J. (2016, June). Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770-778). IEEE.
    [Google Scholar]
  18. Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3), 1-27.
    [CrossRef]   [Google Scholar]
  19. Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
    [CrossRef]   [Google Scholar]
  20. Liew, X. Y., Hameed, N., & Clos, J. (2021). An investigation of XGBoost-based algorithm for breast cancer classification. Machine Learning with Applications, 6, 100154.
    [CrossRef]   [Google Scholar]
  21. Fan, Z., Gou, J., & Wang, C. (2025). An error complementarity-based iterative learning approach via categorical boosting for student performance prediction. Engineering Applications of Artificial Intelligence, 161, 112192.
    [CrossRef]   [Google Scholar]
  22. Realinho, V., Machado, J., Baptista, L., & Martins, M. V. (2022). Predicting student dropout and academic success. Data, 7(11), 146.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Wang, C., & Yao, S. (2025). Enhancing Student Dropout and Academic Success Prediction Using Machine Learning and Over-sampling Techniques. ICCK Transactions on Educational Data Mining, 1(1), 36–43. https://doi.org/10.62762/TEDM.2025.732573
Export Citation
RIS Format
Compatible with EndNote, Zotero, Mendeley, and other reference managers
RIS format data for reference managers
TY  - JOUR
AU  - Wang, Chenxi
AU  - Yao, Shuilin
PY  - 2025
DA  - 2025/12/25
TI  - Enhancing Student Dropout and Academic Success Prediction Using Machine Learning and Over-sampling Techniques
JO  - ICCK Transactions on Educational Data Mining
T2  - ICCK Transactions on Educational Data Mining
JF  - ICCK Transactions on Educational Data Mining
VL  - 1
IS  - 1
SP  - 36
EP  - 43
DO  - 10.62762/TEDM.2025.732573
UR  - https://www.icck.org/article/abs/TEDM.2025.732573
KW  - student dropout prediction
KW  - over-sampling
KW  - SMOTE
KW  - ADASYN
KW  - academic success
AB  - Predicting student dropout and academic success is important for higher education institutions for enhancing retention and deliver timely interventions. However, educational datasets often exhibit severe class imbalance, particularly when multiple academic outcomes (i.e., dropout, enrolled, and graduate) are considered simultaneously. Thus this study examines the effectiveness of three widely used over-sampling techniques (i.e., RandomOverSampler, synthetic minority oversampling technique, and adaptive synthetic sampling) for mitigating class imbalance and enhancing prediction performance. These sampling strategies are evaluated in combination with several machine learning classifiers to assess their influence on accuracy of minority-class detection. The experimental results show that appropriate over-sampling substantially improves model performance, especially for the minority categories. The findings highlight the critical role of imbalance-handling techniques in educational data mining and offer practical insights for institutions seeking to build robust early-warning systems.
SN  - pending
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  - 
BibTeX Format
Compatible with LaTeX, BibTeX, and other reference managers
BibTeX format data for LaTeX and reference managers
@article{Wang2025Enhancing,
  author = {Chenxi Wang and Shuilin Yao},
  title = {Enhancing Student Dropout and Academic Success Prediction Using Machine Learning and Over-sampling Techniques},
  journal = {ICCK Transactions on Educational Data Mining},
  year = {2025},
  volume = {1},
  number = {1},
  pages = {36-43},
  doi = {10.62762/TEDM.2025.732573},
  url = {https://www.icck.org/article/abs/TEDM.2025.732573},
  abstract = {Predicting student dropout and academic success is important for higher education institutions for enhancing retention and deliver timely interventions. However, educational datasets often exhibit severe class imbalance, particularly when multiple academic outcomes (i.e., dropout, enrolled, and graduate) are considered simultaneously. Thus this study examines the effectiveness of three widely used over-sampling techniques (i.e., RandomOverSampler, synthetic minority oversampling technique, and adaptive synthetic sampling) for mitigating class imbalance and enhancing prediction performance. These sampling strategies are evaluated in combination with several machine learning classifiers to assess their influence on accuracy of minority-class detection. The experimental results show that appropriate over-sampling substantially improves model performance, especially for the minority categories. The findings highlight the critical role of imbalance-handling techniques in educational data mining and offer practical insights for institutions seeking to build robust early-warning systems.},
  keywords = {student dropout prediction, over-sampling, SMOTE, ADASYN, academic success},
  issn = {pending},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 253
PDF Downloads: 18

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
ICCK Transactions on Educational Data Mining

ICCK Transactions on Educational Data Mining

ISSN: pending (Online)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/