Volume 2, Issue 1, ICCK Transactions on Educational Data Mining
Volume 2, Issue 1, 2026
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Transactions on Educational Data Mining, Volume 2, Issue 1, 2026: 1-13

Free to Read | Research Article | 28 February 2026
KFWAdaBoost-Based Soft Label Learning Framework for Student Performance Prediction
1 Fuzhou Technology and Business University, Fuzhou 350715, China
* Corresponding Author: Zhihong Yu, [email protected]
ARK: ark:/57805/tedm.2026.459733
Received: 03 February 2026, Accepted: 25 February 2026, Published: 28 February 2026  
Abstract
Student performance prediction is a core task in educational data mining, as it enables early intervention, personalized learning support, and data-driven decision-making. Although existing machine learning models have shown promising results in this domain, challenges persist due to hard-to-classify samples—particularly students exhibiting borderline performance—and the discrete nature of hard labels, which together limit predictive effectiveness. To overcome these limitations, this paper proposes a KFWAdaBoost-based soft label learning framework that systematically enhances baseline model performance through a two-stage synergistic mechanism. In the first stage, K-means++ clustering is employed to generate similarity features, thereby providing structural awareness of underlying data patterns. In the second stage, probabilistic soft labels are derived from ensemble confidence scores to refine decision boundaries and better handle ambiguous cases. Experimental results on the widely used Mathematics and Portuguese Language course datasets demonstrate that the proposed framework consistently improves baseline performance across Accuracy, Precision, Recall, and F1-Score for models including LDA, Decision Tree, and SVM, with Decision Tree exhibiting the most substantial gains. This framework offers a reliable and effective approach for student performance prediction and holds strong potential for broader applications in educational data analytics.

Graphical Abstract
KFWAdaBoost-Based Soft Label Learning Framework for Student Performance Prediction

Keywords
soft label learning
KFWAdaBoost
K-means++ clustering
student performance prediction
educational data mining

Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
The authors declare no conflicts of interest.

AI Use Statement
The authors declare that no generative AI was used in the preparation of this manuscript.

Ethical Approval and Consent to Participate
This study uses a public, anonymized UCI dataset with no direct human involvement. Ethics approval and informed consent are not required.

References
  1. Bai, X., Zhang, F., Li, J., Guo, T., Aziz, A., Jin, A., & Xia, F. (2021). Educational big data: Predictions, applications and challenges. Big Data Research, 26, 100270.
    [CrossRef]   [Google Scholar]
  2. Rabelo, A., Rodrigues, M. W., Nobre, C., Isotani, S., & Zárate, L. (2024). Educational data mining and learning analytics: A review of educational management in e-learning. Information Discovery and Delivery, 52(2), 149--163.
    [CrossRef]   [Google Scholar]
  3. Kalita, E., Oyelere, S. S., Gaftandzhieva, S., Rajesh, K. N., Jagatheesaperumal, S. K., Mohamed, A., ... & Ali, T. (2025). Educational data mining: a 10-year review. Discover Computing, 28(1), 81.
    [CrossRef]   [Google Scholar]
  4. Hemdanou, A. L., Sefian, M. L., Achtoun, Y., & Tahiri, I. (2024). Comparative analysis of feature selection and extraction methods for student performance prediction across different machine learning models. Computers and Education: Artificial Intelligence, 7, 100301.
    [CrossRef]   [Google Scholar]
  5. Öz, E., Bulut, O., Cellat, Z. F., & Yürekli, H. (2025). Stacking: An ensemble learning approach to predict student performance in PISA 2022. Education and Information Technologies, 30(6), 7753-7779.
    [CrossRef]   [Google Scholar]
  6. Cao, W., & Mai, N. (2025). Predictive analytics for student success: AI-driven early warning systems and intervention strategies for educational risk management. Educational Research and Human Development, 2(2), 36-48.
    [Google Scholar]
  7. Bañeres, D., Rodríguez-González, M. E., Guerrero-Roldán, A. E., & Cortadas, P. (2023). An early warning system to identify and intervene online dropout learners. International Journal of Educational Technology in Higher Education, 20(1), 3.
    [CrossRef]   [Google Scholar]
  8. Maiya, A. K., & Aithal, P. S. (2023). A review-based research topic identification on how to improve the quality services of higher education institutions in academic, administrative, and research areas. Maiya, AK, & Aithal, PS,(2023). A Review based Research Topic Identification on How to Improve the Quality Services of Higher Education Institutions in Academic, Administrative, and Research Areas. International Journal of Management, Technology, and Social Sciences (IJMTS), 8(3), 103-153.
    [Google Scholar]
  9. Fan, Z., Gou, J., & Wang, C. (2025). An error complementarity-based iterative learning approach via categorical boosting for student performance prediction. Engineering Applications of Artificial Intelligence, 161, 112192.
    [CrossRef]   [Google Scholar]
  10. Ahmed, E. (2024). Student performance prediction using machine learning algorithms. Applied computational intelligence and soft computing, 2024(1), 4067721.
    [CrossRef]   [Google Scholar]
  11. Zhang, P., Jia, Y., & Shang, Y. (2022). Research and application of XGBoost in imbalanced data. International Journal of Distributed Sensor Networks, 18(6), 15501329221106935.
    [CrossRef]   [Google Scholar]
  12. Arslan, E., Gaftandzhieva, S., Gorgani Firouzjaei, A., Hassannataj Joloudari, J., & Doneva, R. (2025). Ex-ADA: a SHAP-based explainable AdaBoost framework for predicting at-risk students. Frontiers in Education, 10, 1728070.
    [CrossRef]   [Google Scholar]
  13. Piernik, M., & Morzy, T. (2021). A study on using data clustering for feature extraction to improve the quality of classification. Knowledge and Information Systems, 63(7), 1771--1805.
    [CrossRef]   [Google Scholar]
  14. Kapoor, A., & Singhal, A. (2017, February). A comparative study of K-Means, K-Means++ and Fuzzy C-Means clustering algorithms. In 2017 3rd international conference on computational intelligence & communication technology (CICT) (pp. 1-6). IEEE.
    [CrossRef]   [Google Scholar]
  15. Gu, X., Angelov, P., & Rong, H. J. (2019). Local optimality of self-organising neuro-fuzzy inference systems. Information Sciences, 503, 351-380.
    [CrossRef]   [Google Scholar]
  16. Xie, R., Chung, F. L., & Wang, S. (2026). Fuzzy Apriori classifier enhanced by stacking and adversarial knowledge assistance. Information Fusion, 125, 103483.
    [CrossRef]   [Google Scholar]
  17. Gu, X., & Angelov, P. P. (2018). Self-organising fuzzy logic classifier. Information Sciences, 447, 36-51.
    [CrossRef]   [Google Scholar]
  18. Tanveer, M., Tiwari, A., Akhtar, M., & Lin, C. T. (2025). Enhancing imbalance learning: A novel slack-factor fuzzy SVM approach. IEEE Transactions on Emerging Topics in Computational Intelligence, 9(4), 3112-3121.
    [CrossRef]   [Google Scholar]
  19. Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems with Applications, 134, 93--101.
    [CrossRef]   [Google Scholar]
  20. Wong, T. T., & Yeh, P. Y. (2019). Reliable accuracy estimates from k-fold cross validation. IEEE Transactions on Knowledge and Data Engineering, 32(8), 1586-1594.
    [CrossRef]   [Google Scholar]
  21. Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. In Proceedings of 5th Annual Future Business Technology Conference.
    [Google Scholar]

Cite This Article
APA Style
Yu, Z. (2026). KFWAdaBoost-Based Soft Label Learning Framework for Student Performance Prediction. ICCK Transactions on Educational Data Mining, 2(1), 1–13. https://doi.org/10.62762/TEDM.2026.459733
Export Citation
RIS Format
Compatible with EndNote, Zotero, Mendeley, and other reference managers
RIS format data for reference managers
TY  - JOUR
AU  - Yu, Zhihong
PY  - 2026
DA  - 2026/02/28
TI  - KFWAdaBoost-Based Soft Label Learning Framework for Student Performance Prediction
JO  - ICCK Transactions on Educational Data Mining
T2  - ICCK Transactions on Educational Data Mining
JF  - ICCK Transactions on Educational Data Mining
VL  - 2
IS  - 1
SP  - 1
EP  - 13
DO  - 10.62762/TEDM.2026.459733
UR  - https://www.icck.org/article/abs/TEDM.2026.459733
KW  - soft label learning
KW  - KFWAdaBoost
KW  - K-means++ clustering
KW  - student performance prediction
KW  - educational data mining
AB  - Student performance prediction is a core task in educational data mining, as it enables early intervention, personalized learning support, and data-driven decision-making. Although existing machine learning models have shown promising results in this domain, challenges persist due to hard-to-classify samples—particularly students exhibiting borderline performance—and the discrete nature of hard labels, which together limit predictive effectiveness. To overcome these limitations, this paper proposes a KFWAdaBoost-based soft label learning framework that systematically enhances baseline model performance through a two-stage synergistic mechanism. In the first stage, K-means++ clustering is employed to generate similarity features, thereby providing structural awareness of underlying data patterns. In the second stage, probabilistic soft labels are derived from ensemble confidence scores to refine decision boundaries and better handle ambiguous cases. Experimental results on the widely used Mathematics and Portuguese Language course datasets demonstrate that the proposed framework consistently improves baseline performance across Accuracy, Precision, Recall, and F1-Score for models including LDA, Decision Tree, and SVM, with Decision Tree exhibiting the most substantial gains. This framework offers a reliable and effective approach for student performance prediction and holds strong potential for broader applications in educational data analytics.
SN  - 3070-5843
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  - 
BibTeX Format
Compatible with LaTeX, BibTeX, and other reference managers
BibTeX format data for LaTeX and reference managers
@article{Yu2026KFWAdaBoos,
  author = {Zhihong Yu},
  title = {KFWAdaBoost-Based Soft Label Learning Framework for Student Performance Prediction},
  journal = {ICCK Transactions on Educational Data Mining},
  year = {2026},
  volume = {2},
  number = {1},
  pages = {1-13},
  doi = {10.62762/TEDM.2026.459733},
  url = {https://www.icck.org/article/abs/TEDM.2026.459733},
  abstract = {Student performance prediction is a core task in educational data mining, as it enables early intervention, personalized learning support, and data-driven decision-making. Although existing machine learning models have shown promising results in this domain, challenges persist due to hard-to-classify samples—particularly students exhibiting borderline performance—and the discrete nature of hard labels, which together limit predictive effectiveness. To overcome these limitations, this paper proposes a KFWAdaBoost-based soft label learning framework that systematically enhances baseline model performance through a two-stage synergistic mechanism. In the first stage, K-means++ clustering is employed to generate similarity features, thereby providing structural awareness of underlying data patterns. In the second stage, probabilistic soft labels are derived from ensemble confidence scores to refine decision boundaries and better handle ambiguous cases. Experimental results on the widely used Mathematics and Portuguese Language course datasets demonstrate that the proposed framework consistently improves baseline performance across Accuracy, Precision, Recall, and F1-Score for models including LDA, Decision Tree, and SVM, with Decision Tree exhibiting the most substantial gains. This framework offers a reliable and effective approach for student performance prediction and holds strong potential for broader applications in educational data analytics.},
  keywords = {soft label learning, KFWAdaBoost, K-means++ clustering, student performance prediction, educational data mining},
  issn = {3070-5843},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 46
PDF Downloads: 4

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
ICCK Transactions on Educational Data Mining

ICCK Transactions on Educational Data Mining

ISSN: 3070-5843 (Online)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/