KFWAdaBoost-Based Soft Label Learning Framework for Student Performance Prediction

Zhihong Yu

doi:10.62762/TEDM.2026.459733

Volume 2, Issue 1, ICCK Transactions on Educational Data Mining

Volume 2, Issue 1, 2026

Submit Manuscript Edit a Special Issue

Article QR Code

Scan the QR code for reading

Popular articles

Case Studies on Integrating Artificial Intelligence in Finance to Transform Decision Making and Risk Management for Enhanced Financial Outcomes Reinforcement Learning for Prompt Optimization in Language Models: A Comprehensive Survey of Methods, Representations, and Evaluation Challenges AI and the Future of Education: Advancing Personalized Learning and Intelligent Tutoring Systems Reservoir Science: A Multi-Coupling Communication Platform to Promote Energy Transformation, Climate Change and Environmental Protection From CO$_2$ Sequestration to Hydrogen Storage: Further Utilization of Depleted Gas Reservoirs Effects of Crosslinking Agents and Reservoir Conditions on the Propagation of Fractures in Coal Reservoirs During Hydraulic Fracturing Plant Disease Detection Using Deep Learning Techniques Modeling Brain Functional Networks Using Graph Neural Networks: A Review and Clinical Application The Influence of Geological Factors and Transmission Fluids on the Exploitation of Reservoir Geothermal Resources: Factor Discussion and Mechanism Analysis Current Status and Development Prospects of Carbon Capture, Utilization, and Storage (CCUS) in China: Technical, Policy, and Market Perspectives

ICCK Transactions on Educational Data Mining, Volume 2, Issue 1, 2026: 1-13

Free to Read | Research Article | 28 February 2026

KFWAdaBoost-Based Soft Label Learning Framework for Student Performance Prediction

Zhihong Yu 1 *

1 Fuzhou Technology and Business University, Fuzhou 350715, China

* Corresponding Author: Zhihong Yu, [email protected]

DOI: 10.62762/TEDM.2026.459733

ARK: ark:/57805/tedm.2026.459733

Received: 03 February 2026, Accepted: 25 February 2026, Published: 28 February 2026

PDF (3.12 MB)

Article Metrics Cite This Article

Abstract

Student performance prediction is a core task in educational data mining, as it enables early intervention, personalized learning support, and data-driven decision-making. Although existing machine learning models have shown promising results in this domain, challenges persist due to hard-to-classify samples—particularly students exhibiting borderline performance—and the discrete nature of hard labels, which together limit predictive effectiveness. To overcome these limitations, this paper proposes a KFWAdaBoost-based soft label learning framework that systematically enhances baseline model performance through a two-stage synergistic mechanism. In the first stage, K-means++ clustering is employed to generate similarity features, thereby providing structural awareness of underlying data patterns. In the second stage, probabilistic soft labels are derived from ensemble confidence scores to refine decision boundaries and better handle ambiguous cases. Experimental results on the widely used Mathematics and Portuguese Language course datasets demonstrate that the proposed framework consistently improves baseline performance across Accuracy, Precision, Recall, and F1-Score for models including LDA, Decision Tree, and SVM, with Decision Tree exhibiting the most substantial gains. This framework offers a reliable and effective approach for student performance prediction and holds strong potential for broader applications in educational data analytics.

Graphical Abstract

Keywords

soft label learning

KFWAdaBoost

K-means++ clustering

student performance prediction

educational data mining

Data Availability Statement

Data will be made available on request.

Funding

This work was supported without any funding.

Conflicts of Interest

The authors declare no conflicts of interest.

AI Use Statement

The authors declare that no generative AI was used in the preparation of this manuscript.

Ethical Approval and Consent to Participate

This study uses a public, anonymized UCI dataset with no direct human involvement. Ethics approval and informed consent are not required.

References

Bai, X., Zhang, F., Li, J., Guo, T., Aziz, A., Jin, A., & Xia, F. (2021). Educational big data: Predictions, applications and challenges. Big Data Research, 26, 100270.
[CrossRef] [Google Scholar]
Rabelo, A., Rodrigues, M. W., Nobre, C., Isotani, S., & Zárate, L. (2024). Educational data mining and learning analytics: A review of educational management in e-learning. Information Discovery and Delivery, 52(2), 149--163.
[CrossRef] [Google Scholar]
Kalita, E., Oyelere, S. S., Gaftandzhieva, S., Rajesh, K. N., Jagatheesaperumal, S. K., Mohamed, A., ... & Ali, T. (2025). Educational data mining: a 10-year review. Discover Computing, 28(1), 81.
[CrossRef] [Google Scholar]
Hemdanou, A. L., Sefian, M. L., Achtoun, Y., & Tahiri, I. (2024). Comparative analysis of feature selection and extraction methods for student performance prediction across different machine learning models. Computers and Education: Artificial Intelligence, 7, 100301.
[CrossRef] [Google Scholar]
Öz, E., Bulut, O., Cellat, Z. F., & Yürekli, H. (2025). Stacking: An ensemble learning approach to predict student performance in PISA 2022. Education and Information Technologies, 30(6), 7753-7779.
[CrossRef] [Google Scholar]
Cao, W., & Mai, N. (2025). Predictive analytics for student success: AI-driven early warning systems and intervention strategies for educational risk management. Educational Research and Human Development, 2(2), 36-48.
[Google Scholar]
Bañeres, D., Rodríguez-González, M. E., Guerrero-Roldán, A. E., & Cortadas, P. (2023). An early warning system to identify and intervene online dropout learners. International Journal of Educational Technology in Higher Education, 20(1), 3.
[CrossRef] [Google Scholar]
Maiya, A. K., & Aithal, P. S. (2023). A review-based research topic identification on how to improve the quality services of higher education institutions in academic, administrative, and research areas. Maiya, AK, & Aithal, PS,(2023). A Review based Research Topic Identification on How to Improve the Quality Services of Higher Education Institutions in Academic, Administrative, and Research Areas. International Journal of Management, Technology, and Social Sciences (IJMTS), 8(3), 103-153.
[Google Scholar]
Fan, Z., Gou, J., & Wang, C. (2025). An error complementarity-based iterative learning approach via categorical boosting for student performance prediction. Engineering Applications of Artificial Intelligence, 161, 112192.
[CrossRef] [Google Scholar]
Ahmed, E. (2024). Student performance prediction using machine learning algorithms. Applied computational intelligence and soft computing, 2024(1), 4067721.
[CrossRef] [Google Scholar]
Zhang, P., Jia, Y., & Shang, Y. (2022). Research and application of XGBoost in imbalanced data. International Journal of Distributed Sensor Networks, 18(6), 15501329221106935.
[CrossRef] [Google Scholar]
Arslan, E., Gaftandzhieva, S., Gorgani Firouzjaei, A., Hassannataj Joloudari, J., & Doneva, R. (2025). Ex-ADA: a SHAP-based explainable AdaBoost framework for predicting at-risk students. Frontiers in Education, 10, 1728070.
[CrossRef] [Google Scholar]
Piernik, M., & Morzy, T. (2021). A study on using data clustering for feature extraction to improve the quality of classification. Knowledge and Information Systems, 63(7), 1771--1805.
[CrossRef] [Google Scholar]
Kapoor, A., & Singhal, A. (2017, February). A comparative study of K-Means, K-Means++ and Fuzzy C-Means clustering algorithms. In 2017 3rd international conference on computational intelligence & communication technology (CICT) (pp. 1-6). IEEE.
[CrossRef] [Google Scholar]
Gu, X., Angelov, P., & Rong, H. J. (2019). Local optimality of self-organising neuro-fuzzy inference systems. Information Sciences, 503, 351-380.
[CrossRef] [Google Scholar]
Xie, R., Chung, F. L., & Wang, S. (2026). Fuzzy Apriori classifier enhanced by stacking and adversarial knowledge assistance. Information Fusion, 125, 103483.
[CrossRef] [Google Scholar]
Gu, X., & Angelov, P. P. (2018). Self-organising fuzzy logic classifier. Information Sciences, 447, 36-51.
[CrossRef] [Google Scholar]
Tanveer, M., Tiwari, A., Akhtar, M., & Lin, C. T. (2025). Enhancing imbalance learning: A novel slack-factor fuzzy SVM approach. IEEE Transactions on Emerging Topics in Computational Intelligence, 9(4), 3112-3121.
[CrossRef] [Google Scholar]
Speiser, J. L., Miller, M. E., Tooze, J., & Ip, E. (2019). A comparison of random forest variable selection methods for classification prediction modeling. Expert Systems with Applications, 134, 93--101.
[CrossRef] [Google Scholar]
Wong, T. T., & Yeh, P. Y. (2019). Reliable accuracy estimates from k-fold cross validation. IEEE Transactions on Knowledge and Data Engineering, 32(8), 1586-1594.
[CrossRef] [Google Scholar]
Cortez, P., & Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. In Proceedings of 5th Annual Future Business Technology Conference.
[Google Scholar]

Cite This Article

APA Style

Yu, Z. (2026). KFWAdaBoost-Based Soft Label Learning Framework for Student Performance Prediction. ICCK Transactions on Educational Data Mining, 2(1), 1–13. https://doi.org/10.62762/TEDM.2026.459733

Export Citation

RIS Format

Compatible with EndNote, Zotero, Mendeley, and other reference managers

RIS format data for reference managers

TY  - JOUR
AU  - Yu, Zhihong
PY  - 2026
DA  - 2026/02/28
TI  - KFWAdaBoost-Based Soft Label Learning Framework for Student Performance Prediction
JO  - ICCK Transactions on Educational Data Mining
T2  - ICCK Transactions on Educational Data Mining
JF  - ICCK Transactions on Educational Data Mining
VL  - 2
IS  - 1
SP  - 1
EP  - 13
DO  - 10.62762/TEDM.2026.459733
UR  - https://www.icck.org/article/abs/TEDM.2026.459733
KW  - soft label learning
KW  - KFWAdaBoost
KW  - K-means++ clustering
KW  - student performance prediction
KW  - educational data mining
AB  - Student performance prediction is a core task in educational data mining, as it enables early intervention, personalized learning support, and data-driven decision-making. Although existing machine learning models have shown promising results in this domain, challenges persist due to hard-to-classify samples—particularly students exhibiting borderline performance—and the discrete nature of hard labels, which together limit predictive effectiveness. To overcome these limitations, this paper proposes a KFWAdaBoost-based soft label learning framework that systematically enhances baseline model performance through a two-stage synergistic mechanism. In the first stage, K-means++ clustering is employed to generate similarity features, thereby providing structural awareness of underlying data patterns. In the second stage, probabilistic soft labels are derived from ensemble confidence scores to refine decision boundaries and better handle ambiguous cases. Experimental results on the widely used Mathematics and Portuguese Language course datasets demonstrate that the proposed framework consistently improves baseline performance across Accuracy, Precision, Recall, and F1-Score for models including LDA, Decision Tree, and SVM, with Decision Tree exhibiting the most substantial gains. This framework offers a reliable and effective approach for student performance prediction and holds strong potential for broader applications in educational data analytics.
SN  - 3070-5843
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  -

BibTeX Format

Compatible with LaTeX, BibTeX, and other reference managers

BibTeX format data for LaTeX and reference managers

@article{Yu2026KFWAdaBoos,
  author = {Zhihong Yu},
  title = {KFWAdaBoost-Based Soft Label Learning Framework for Student Performance Prediction},
  journal = {ICCK Transactions on Educational Data Mining},
  year = {2026},
  volume = {2},
  number = {1},
  pages = {1-13},
  doi = {10.62762/TEDM.2026.459733},
  url = {https://www.icck.org/article/abs/TEDM.2026.459733},
  abstract = {Student performance prediction is a core task in educational data mining, as it enables early intervention, personalized learning support, and data-driven decision-making. Although existing machine learning models have shown promising results in this domain, challenges persist due to hard-to-classify samples—particularly students exhibiting borderline performance—and the discrete nature of hard labels, which together limit predictive effectiveness. To overcome these limitations, this paper proposes a KFWAdaBoost-based soft label learning framework that systematically enhances baseline model performance through a two-stage synergistic mechanism. In the first stage, K-means++ clustering is employed to generate similarity features, thereby providing structural awareness of underlying data patterns. In the second stage, probabilistic soft labels are derived from ensemble confidence scores to refine decision boundaries and better handle ambiguous cases. Experimental results on the widely used Mathematics and Portuguese Language course datasets demonstrate that the proposed framework consistently improves baseline performance across Accuracy, Precision, Recall, and F1-Score for models including LDA, Decision Tree, and SVM, with Decision Tree exhibiting the most substantial gains. This framework offers a reliable and effective approach for student performance prediction and holds strong potential for broader applications in educational data analytics.},
  keywords = {soft label learning, KFWAdaBoost, K-means++ clustering, student performance prediction, educational data mining},
  issn = {3070-5843},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics

Citations:

Google Scholar

Crossref

Scopus

Web of Science

Article Access Statistics:

PDF Downloads: 4

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

ICCK Transactions on Educational Data Mining

ISSN: 3070-5843 (Online)

Email: [email protected]

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/

User

Unlimited Downloads

Complete Library Access

Membership Eligibility

Community Leadership Opportunities

Google Scholar

Crossref

Scopus

Web of Science