Abstract
The classification of Capsicum spp. varieties is often hindered by their morphological similarities, making accurate identification a challenging task. To address this issue, this study applies a hybrid computational approach that combines data dimensionality reduction techniques using Principal Component Analysis and Factor Analysis with various supervised Machine Learning algorithms. The dataset, which is unprecedented in the literature and was collected under controlled agricultural conditions, enables a robust evaluation of models including Logistic Regression, Support Vector Machine, K-Nearest Neighbors, Random Forest, Decision Tree, and Gradient Boosting. Model performance was assessed using Leave-One-Out and K-Fold cross-validation methods. Additionally, the SHapley Additive exPlanations method was applied to assess the importance of the features in species classification, providing greater interpretability that reinforces the relevance of morphological and agronomic descriptors in differentiating pepper varieties. The results show that all models achieved high performance metrics, including accuracy, F1-score, precision, and recall, consistently above 0.89, validating the effectiveness of the proposed approach. These findings highlight the potential of integrated Machine Learning frameworks for species classification in agriculture, contributing to practical applications and advancing intelligent analysis of biological data.
Keywords
species prediction
pepper
machine learning
classification algorithms
factor analysis
Data Availability Statement
The dataset is available upon request and can also be accessed at the following link: https://github.com/Matheuscp98/PepperCapsicum.
Funding
This work was supported in part by the FAPEMIG under Grant BPD-01045-22; in part by the CAPES; in part by the CNPq; and in part by the NOMATI–UNIFEI, which provided access to laboratories, materials, and technical expertise.
Conflicts of Interest
The authors declare no conflicts of interest.
Ethical Approval and Consent to Participate
Not applicable.
Cite This Article
APA Style
Pereira, M. S., de Azevedo, T. M., Ribeiro, C. T., Freire, A. I., Francisco, M. B., Pereira, J. L. J., & de Paiva, A. P. (2025). Discriminating Planted Capsicum Spp. Varieties via Machine Learning and Multivariate Data Reduction. ICCK Transactions on Machine Intelligence, 1(3), 166–185. https://doi.org/10.62762/TMI.2025.385133
Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.