-
CiteScore
-
Impact Factor
Volume 1, Issue 1, ICCK Transactions on Educational Data Mining
Volume 1, Issue 1, 2025
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Transactions on Educational Data Mining, Volume 1, Issue 1, 2025: 25-35

Free to Read | Research Article | 28 November 2025
A Gradient Boosting-Based Feature Selection Framework for Predicting Student Performance
1 Xiamen Institute of Software Technology, Xiamen 361024, China
* Corresponding Author: Shaoyuan Weng, [email protected]
Received: 14 October 2025, Accepted: 22 October 2025, Published: 28 November 2025  
Abstract
In educational data mining, accurate prediction of student performance is important for supporting timely intervention for at-risk students. However, educational datasets often include irrelevant or redundant features that could reduce the performance of prediction models. To tackle this issue, this study proposes a gradient boosting-based feature selection framework that can automatically identify and obtain the most important features for student performance prediction. The proposed framework leverages the gradient boosting model to calculate feature importance and refine the feature subset, aiming to achieve comparable or superior prediction performance using fewer but important input features. To ensure a robust evaluation of the results, we apply a 10-fold cross-validation strategy with 10 repetitions. Experimental results on the Mathematics and Portuguese Language course datasets demonstrate that the proposed framework is able to consistently outperform the baseline models in terms of the evaluation metrics used. These findings highlight the effectiveness of the proposed feature selection for student performance, which makes it a reliable tool for data-driven educational analytics.

Graphical Abstract
A Gradient Boosting-Based Feature Selection Framework for Predicting Student Performance

Keywords
feature selection
gradient boosting
student performance prediction
educational data mining

Data Availability Statement
Data will be made available on request.

Funding
This work was supported by the Fujian Provincial Young and Middle-aged Teachers' Educational Research Project (Science and Technology Category), China under Grant JAT241390.

Conflicts of Interest
The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Bradley, V. M. (2021). Learning Management System (LMS) use with online instruction. International Journal of Technology in Education, 4(1), 68-92.
    [Google Scholar]
  2. Heil, J., & Ifenthaler, D. (2023). Online Assessment in Higher Education: A Systematic Review. Online Learning, 27(1), 187-218.
    [Google Scholar]
  3. Vieira, C., Parsons, P., & Byrd, V. (2018). Visual learning analytics of educational data: A systematic literature review and research agenda. Computers & Education, 122, 119-135.
    [CrossRef]   [Google Scholar]
  4. Rabelo, A., Rodrigues, M. W., Nobre, C., Isotani, S., & Zárate, L. (2024). Educational data mining and learning analytics: A review of educational management in e-learning. Information Discovery and Delivery, 52(2), 149-163.
    [CrossRef]   [Google Scholar]
  5. Kalita, E., Oyelere, S. S., Gaftandzhieva, S., Rajesh, K. N., Jagatheesaperumal, S. K., Mohamed, A., ... & Ali, T. (2025). Educational data mining: A 10-year review. Discover Computing, 28(1), 81.
    [CrossRef]   [Google Scholar]
  6. Fan, Z., Gou, J., & Wang, C. (2025). An error complementarity-based iterative learning approach via categorical boosting for student performance prediction. Engineering Applications of Artificial Intelligence, 161, 112192.
    [CrossRef]   [Google Scholar]
  7. Cao, W., & Mai, N. (2025). Predictive Analytics for Student Success: AI-Driven Early Warning Systems and Intervention Strategies for Educational Risk Management. Educational Research and Human Development, 2(2), 36-48.
    [CrossRef]   [Google Scholar]
  8. Hemdanou, A. L., Sefian, M. L., Achtoun, Y., & Tahiri, I. (2024). Comparative analysis of feature selection and extraction methods for student performance prediction across different machine learning models. Computers and Education: Artificial Intelligence, 7, 100301.
    [CrossRef]   [Google Scholar]
  9. Öz, E., Bulut, O., Cellat, Z. F., & Yürekli, H. (2025). Stacking: An ensemble learning approach to predict student performance in PISA 2022. Education and Information Technologies, 30(6), 7753-7779.
    [CrossRef]   [Google Scholar]
  10. Bañeres, D., Rodríguez-González, M. E., Guerrero-Roldán, A. E., & Cortadas, P. (2023). An early warning system to identify and intervene online dropout learners. International Journal of Educational Technology in Higher Education, 20(1), 3.
    [CrossRef]   [Google Scholar]
  11. Maiya, A. K., & Aithal, P. S., (2023). A Review based Research Topic Identification on How to Improve the Quality Services of Higher Education Institutions in Academic, Administrative, and Research Areas. International Journal of Management, Technology, and Social Sciences (IJMTS), 8(3), 103-153.
    [CrossRef]   [Google Scholar]
  12. Ilani, M. A., & Banad, Y. M. (2024). EDMNet: unveiling the power of machine learning in regression modeling of powder mixed-EDM. The International Journal of Advanced Manufacturing Technology, 135(5), 2555-2570.
    [CrossRef]   [Google Scholar]
  13. Hong, Y. Z., Rani, M. N. A., Radzuan, N. F. M., Yen, L. H., & Nagalingam, S. A. R. A. S. V. A. T. H. I. (2024). An early warning system for students at risk using supervised machine learning. Journal of Engineering Science and Technology, 19(1), 131-139.
    [Google Scholar]
  14. Cheng, B., Liu, Y., & Jia, Y. (2024). Evaluation of students' performance during the academic period using the XG-Boost Classifier-Enhanced AEO hybrid model. Expert Systems with Applications, 238, 122136.
    [CrossRef]   [Google Scholar]
  15. Fan, Z., Gou, J., & Weng, S. (2025). Complementary CatBoost based on residual error for student performance prediction. Pattern Recognition, 161, 111265.
    [CrossRef]   [Google Scholar]
  16. Zaffar, M., Hashmani, M. A., Savita, K. S., & Khan, S. A. (2021). A review on feature selection methods for improving the performance of classification in educational data mining. International Journal of Information Technology and Management, 20(1-2), 110-131.
    [CrossRef]   [Google Scholar]
  17. Song, Y. W., Wang, J. S., Qi, Y. L., Wang, Y. C., Li, S., Song, H. M., & Shang-Guan, Y. P. (2025). PF-PSS: a double-layer parallel embedded feature selection method for cancer gene expression data. Journal of Big Data, 12(1), 136.
    [CrossRef]   [Google Scholar]
  18. Huang, Y., Chen, G., Gou, J., Fan, Z., & Liao, Y. (2025). A hybrid feature selection and aggregation strategy-based stacking ensemble technique for network intrusion detection. Applied Intelligence, 55(1), 28.
    [CrossRef]   [Google Scholar]
  19. Zhou, H., Zhang, J., Zhou, Y., Guo, X., & Ma, Y. (2021). A feature selection algorithm of decision tree based on feature weight. Expert Systems with Applications, 164, 113842.
    [CrossRef]   [Google Scholar]
  20. Fan, Z., Gou, J., & Weng, S. (2024). A feature importance-based multi-layer catboost for student performance prediction. IEEE Transactions on Knowledge and Data Engineering, 36(11), 5495-5507.
    [CrossRef]   [Google Scholar]
  21. Fan, Z., Gou, J., & Wang, C. (2023). Predicting secondary school student performance using a double particle swarm optimization-based categorical boosting model. Engineering Applications of Artificial Intelligence, 124, 106649.
    [CrossRef]   [Google Scholar]
  22. Cortez, P. (2008). Student Performance [Dataset]. UCI Machine Learning Repository.
    [CrossRef]   [Google Scholar]
  23. Fan, Z., Gou, J., & Weng, S. (2024). An unbiased fuzzy weighted relative error support vector machine for reverse prediction of concrete components. IEEE Transactions on Artificial Intelligence, 5(9), 4574-4584.
    [CrossRef]   [Google Scholar]
  24. Wong, T. T., & Yeh, P. Y. (2019). Reliable accuracy estimates from k-fold cross validation. IEEE Transactions on Knowledge and Data Engineering, 32(8), 1586-1594.
    [CrossRef]   [Google Scholar]
  25. Guleria, P., & Sood, M. (2023). Explainable AI and machine learning: performance evaluation and explainability of classifiers on educational data mining inspired career counseling. Education and Information Technologies, 28(1), 1081-1116.
    [CrossRef]   [Google Scholar]
  26. Bosch, N. (2021). AutoML feature engineering for student modeling yields high accuracy, but limited interpretability. Journal of Educational Data Mining, 13(2), 55-79.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Weng, S., Zheng, Y., Zhang, C., & Liu, Z. (2025). A Gradient Boosting-Based Feature Selection Framework for Predicting Student Performance. ICCK Transactions on Educational Data Mining, 1(1), 25–35. https://doi.org/10.62762/TEDM.2025.414136
Export Citation
RIS Format
Compatible with EndNote, Zotero, Mendeley, and other reference managers
RIS format data for reference managers
TY  - JOUR
AU  - Weng, Shaoyuan
AU  - Zheng, Yuanyuan
AU  - Zhang, Chao
AU  - Liu, Zimeng
PY  - 2025
DA  - 2025/11/28
TI  - A Gradient Boosting-Based Feature Selection Framework for Predicting Student Performance
JO  - ICCK Transactions on Educational Data Mining
T2  - ICCK Transactions on Educational Data Mining
JF  - ICCK Transactions on Educational Data Mining
VL  - 1
IS  - 1
SP  - 25
EP  - 35
DO  - 10.62762/TEDM.2025.414136
UR  - https://www.icck.org/article/abs/TEDM.2025.414136
KW  - feature selection
KW  - gradient boosting
KW  - student performance prediction
KW  - educational data mining
AB  - In educational data mining, accurate prediction of student performance is important for supporting timely intervention for at-risk students. However, educational datasets often include irrelevant or redundant features that could reduce the performance of prediction models. To tackle this issue, this study proposes a gradient boosting-based feature selection framework that can automatically identify and obtain the most important features for student performance prediction. The proposed framework leverages the gradient boosting model to calculate feature importance and refine the feature subset, aiming to achieve comparable or superior prediction performance using fewer but important input features. To ensure a robust evaluation of the results, we apply a 10-fold cross-validation strategy with 10 repetitions. Experimental results on the Mathematics and Portuguese Language course datasets demonstrate that the proposed framework is able to consistently outperform the baseline models in terms of the evaluation metrics used. These findings highlight the effectiveness of the proposed feature selection for student performance, which makes it a reliable tool for data-driven educational analytics.
SN  - pending
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  - 
BibTeX Format
Compatible with LaTeX, BibTeX, and other reference managers
BibTeX format data for LaTeX and reference managers
@article{Weng2025A,
  author = {Shaoyuan Weng and Yuanyuan Zheng and Chao Zhang and Zimeng Liu},
  title = {A Gradient Boosting-Based Feature Selection Framework for Predicting Student Performance},
  journal = {ICCK Transactions on Educational Data Mining},
  year = {2025},
  volume = {1},
  number = {1},
  pages = {25-35},
  doi = {10.62762/TEDM.2025.414136},
  url = {https://www.icck.org/article/abs/TEDM.2025.414136},
  abstract = {In educational data mining, accurate prediction of student performance is important for supporting timely intervention for at-risk students. However, educational datasets often include irrelevant or redundant features that could reduce the performance of prediction models. To tackle this issue, this study proposes a gradient boosting-based feature selection framework that can automatically identify and obtain the most important features for student performance prediction. The proposed framework leverages the gradient boosting model to calculate feature importance and refine the feature subset, aiming to achieve comparable or superior prediction performance using fewer but important input features. To ensure a robust evaluation of the results, we apply a 10-fold cross-validation strategy with 10 repetitions. Experimental results on the Mathematics and Portuguese Language course datasets demonstrate that the proposed framework is able to consistently outperform the baseline models in terms of the evaluation metrics used. These findings highlight the effectiveness of the proposed feature selection for student performance, which makes it a reliable tool for data-driven educational analytics.},
  keywords = {feature selection, gradient boosting, student performance prediction, educational data mining},
  issn = {pending},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 60
PDF Downloads: 22

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
ICCK Transactions on Educational Data Mining

ICCK Transactions on Educational Data Mining

ISSN: pending (Online)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/