-
CiteScore
-
Impact Factor
Volume 1, Issue 1, ICCK Transactions on Educational Data Mining
Volume 1, Issue 1, 2025
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Transactions on Educational Data Mining, Volume 1, Issue 1, 2025: 16-24

Research Article | 25 November 2025
A Stacking-Based RF-CatBoost Model for Student Performance Prediction
by
1 School of Environment and Public Health, Xiamen Huaxia University, Xiamen 361024, China
* Corresponding Author: Jing Zhao, [email protected]
ARK: ark:/57805/tedm.2025.397583
Received: 27 October 2025, Accepted: 13 November 2025, Published: 25 November 2025  
Abstract
To address the student performance problem in educational data mining, this study proposes a stacking-based RF-CatBoost model that integrates the complementary strengths of ensemble learning methods to enhance prediction accuracy and robustness. In the proposed framework, Random Forest (RF) and CatBoost are employed as the base learners to capture both global feature interactions and complex non-linear relationships within multi-source educational data. Their outputs are then stacked and fused using a combination strategy to generate the final prediction. Experimental results based on two educational datasets demonstrate that the stacking-based RF-CatBoost model consistently achieves superior predictive performance, reflected in prediction accuracy and robustness. The results confirm that the proposed hybrid stacking RF-CatBoost can effectively leverage the diversity of ensemble learners, offering a robust solution for early student performance prediction, enabling timely interventions and personalized learning support in educational setting.

Graphical Abstract
A Stacking-Based RF-CatBoost Model for Student Performance Prediction

Keywords
stacking
random forest
CatBoost
student performance prediction

Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Dahdouh, K., Dakkak, A., Oughdir, L., & Messaoudi, F. (2018). Big data for online learning systems. Education and Information Technologies, 23(6), 2783-2800.
    [CrossRef]   [Google Scholar]
  2. Feng, G., & Fan, M. (2024). Research on learning behavior patterns from the perspective of educational data mining: Evaluation, prediction and visualization. Expert Systems with Applications, 237, 121555.
    [CrossRef]   [Google Scholar]
  3. Batool, S., Rashid, J., Nisar, M. W., Kim, J., Kwon, H. Y., & Hussain, A. (2023). Educational data mining to predict students' academic performance: A survey study. Education and Information Technologies, 28(1), 905-971.
    [CrossRef]   [Google Scholar]
  4. Yağcı, M. (2022). Educational data mining: prediction of students' academic performance using machine learning algorithms. Smart Learning Environments, 9(1), 11.
    [CrossRef]   [Google Scholar]
  5. Sharma, R., Shrivastava, S. S., & Sharma, A. (2023). Predicting Student Performance Using Educational Data Mining and Learning Analytics Technique. Journal of Intelligent Systems and Internet of Things, 10(2), 24-37.
    [CrossRef]   [Google Scholar]
  6. Ujkani, B., Minkovska, D., & Hinov, N. (2024). Course success prediction and early identification of at-risk students using explainable artificial intelligence. Electronics, 13(21), 4157.
    [CrossRef]   [Google Scholar]
  7. Cao, W., & Mai, N. (2025). Predictive Analytics for Student Success: AI-Driven Early Warning Systems and Intervention Strategies for Educational Risk Management. Educational Research and Human Development, 2(2), 36-48.
    [CrossRef]   [Google Scholar]
  8. Albreiki, B., Zaki, N., & Alashwal, H. (2021). A systematic literature review of student’performance prediction using machine learning techniques. Education Sciences, 11(9), 552.
    [CrossRef]   [Google Scholar]
  9. Fan, Z., Gou, J., & Wang, C. (2025). An error complementarity-based iterative learning approach via categorical boosting for student performance prediction. Engineering Applications of Artificial Intelligence, 161, 112192.
    [CrossRef]   [Google Scholar]
  10. Fan, Z., Gou, J., & Weng, S. (2025). Complementary CatBoost based on residual error for student performance prediction. Pattern Recognition, 161, 111265.
    [CrossRef]   [Google Scholar]
  11. Yang, Z. (2024). Competing leaders grey wolf optimizer and its application for training multi-layer perceptron classifier. Expert Systems with Applications, 239, 122349.
    [CrossRef]   [Google Scholar]
  12. Guido, R., Ferrisi, S., Lofaro, D., & Conforti, D. (2024). An overview on the advancements of support vector machine models in healthcare applications: a review. Information, 15(4), 235.
    [CrossRef]   [Google Scholar]
  13. Iranzad, R., & Liu, X. (2025). A review of random forest-based feature selection methods for data science education and applications. International Journal of Data Science and Analytics, 20(2), 197-211.
    [CrossRef]   [Google Scholar]
  14. Fan, Z., Chiong, R., Hu, Z., Keivanian, F., & Chiong, F. (2022). Body fat prediction through feature extraction based on anthropometric and laboratory measurements. Plos one, 17(2), e0263333.
    [CrossRef]   [Google Scholar]
  15. Nayak, P., Vaheed, S., Gupta, S., & Mohan, N. (2023). Predicting students’ academic performance by mining the educational data through machine learning-based classification model. Education and Information Technologies, 28(11), 14611-14637.
    [CrossRef]   [Google Scholar]
  16. Du, K. L., Jiang, B., Lu, J., Hua, J., & Swamy, M. N. S. (2024). Exploring kernel machines and support vector machines: Principles, techniques, and future directions. Mathematics, 12(24), 3935.
    [CrossRef]   [Google Scholar]
  17. Fan, Z., Chiong, R., Hu, Z., & Lin, Y. (2020). A fuzzy weighted relative error support vector machine for reverse prediction of concrete components. Computers & Structures, 230, 106171.
    [CrossRef]   [Google Scholar]
  18. Kavzoglu, T., & Teke, A. (2022). Predictive performances of ensemble machine learning algorithms in landslide susceptibility mapping using random forest, extreme gradient boosting (XGBoost) and natural gradient boosting (NGBoost). Arabian Journal for Science and Engineering, 47(6), 7367-7385.
    [CrossRef]   [Google Scholar]
  19. Salman, H. A., Kalakech, A., & Steiti, A. (2024). Random forest algorithm overview. Babylonian Journal of Machine Learning, 2024, 69-79.
    [CrossRef]   [Google Scholar]
  20. Geeitha, S., Ravishankar, K., Cho, J., & Easwaramoorthy, S. V. (2024). Integrating cat boost algorithm with triangulating feature importance to predict survival outcome in recurrent cervical cancer. Scientific Reports, 14(1), 19828.
    [CrossRef]   [Google Scholar]
  21. Nti, I. K., Nyarko-Boateng, O., & Aning, J. (2021). Performance of machine learning algorithms with different K values in K-fold cross-validation. International Journal of Information Technology and Computer Science, 13(6), 61-71.
    [CrossRef]   [Google Scholar]
  22. Cortez, P. (2008). Student Performance [Dataset]. UCI Machine Learning Repository.
    [CrossRef]   [Google Scholar]
  23. Li, N., Shen, Q., Song, R., Chi, Y., & Xu, H. (2022). MEduKG: a deep-learning-based approach for multi-modal educational knowledge graph construction. Information, 13(2), 91.
    [CrossRef]   [Google Scholar]
  24. Dwivedi, R., Dave, D., Naik, H., Singhal, S., Omer, R., Patel, P., ... & Ranjan, R. (2023). Explainable AI (XAI): Core ideas, techniques, and solutions. ACM computing surveys, 55(9), 1-33.
    [CrossRef]   [Google Scholar]
  25. Kim, T. W., & Kwak, K. C. (2024). Speech emotion recognition using deep learning transfer models and explainable techniques. Applied Sciences, 14(4), 1553.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Zhao, J. (2025). A Stacking-Based RF-CatBoost Model for Student Performance Prediction. ICCK Transactions on Educational Data Mining, 1(1), 16–24. https://doi.org/10.62762/TEDM.2025.397583
Export Citation
RIS Format
Compatible with EndNote, Zotero, Mendeley, and other reference managers
RIS format data for reference managers
TY  - JOUR
AU  - Zhao, Jing
PY  - 2025
DA  - 2025/11/25
TI  - A Stacking-Based RF-CatBoost Model for Student Performance Prediction
JO  - ICCK Transactions on Educational Data Mining
T2  - ICCK Transactions on Educational Data Mining
JF  - ICCK Transactions on Educational Data Mining
VL  - 1
IS  - 1
SP  - 16
EP  - 24
DO  - 10.62762/TEDM.2025.397583
UR  - https://www.icck.org/article/abs/TEDM.2025.397583
KW  - stacking
KW  - random forest
KW  - CatBoost
KW  - student performance prediction
AB  - To address the student performance problem in educational data mining, this study proposes a stacking-based RF-CatBoost model that integrates the complementary strengths of ensemble learning methods to enhance prediction accuracy and robustness. In the proposed framework, Random Forest (RF) and CatBoost are employed as the base learners to capture both global feature interactions and complex non-linear relationships within multi-source educational data. Their outputs are then stacked and fused using a combination strategy to generate the final prediction. Experimental results based on two educational datasets demonstrate that the stacking-based RF-CatBoost model consistently achieves superior predictive performance, reflected in prediction accuracy and robustness. The results confirm that the proposed hybrid stacking RF-CatBoost can effectively leverage the diversity of ensemble learners, offering a robust solution for early student performance prediction, enabling timely interventions and personalized learning support in educational setting.
SN  - pending
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  - 
BibTeX Format
Compatible with LaTeX, BibTeX, and other reference managers
BibTeX format data for LaTeX and reference managers
@article{Zhao2025A,
  author = {Jing Zhao},
  title = {A Stacking-Based RF-CatBoost Model for Student Performance Prediction},
  journal = {ICCK Transactions on Educational Data Mining},
  year = {2025},
  volume = {1},
  number = {1},
  pages = {16-24},
  doi = {10.62762/TEDM.2025.397583},
  url = {https://www.icck.org/article/abs/TEDM.2025.397583},
  abstract = {To address the student performance problem in educational data mining, this study proposes a stacking-based RF-CatBoost model that integrates the complementary strengths of ensemble learning methods to enhance prediction accuracy and robustness. In the proposed framework, Random Forest (RF) and CatBoost are employed as the base learners to capture both global feature interactions and complex non-linear relationships within multi-source educational data. Their outputs are then stacked and fused using a combination strategy to generate the final prediction. Experimental results based on two educational datasets demonstrate that the stacking-based RF-CatBoost model consistently achieves superior predictive performance, reflected in prediction accuracy and robustness. The results confirm that the proposed hybrid stacking RF-CatBoost can effectively leverage the diversity of ensemble learners, offering a robust solution for early student performance prediction, enabling timely interventions and personalized learning support in educational setting.},
  keywords = {stacking, random forest, CatBoost, student performance prediction},
  issn = {pending},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 261
PDF Downloads: 81

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
ICCK Transactions on Educational Data Mining

ICCK Transactions on Educational Data Mining

ISSN: pending (Online)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/