-
CiteScore
-
Impact Factor
Volume 1, Issue 3, ICCK Transactions on Machine Intelligence
Volume 1, Issue 3, 2025
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Transactions on Machine Intelligence, Volume 1, Issue 3, 2025: 166-185

Free to Read | Research Article | Feature Paper | 15 November 2025
Discriminating Planted Capsicum Spp. Varieties via Machine Learning and Multivariate Data Reduction
1 Industrial and Management Engineering Institute, Federal University of Itajubá (UNIFEI), Itajubá, 37500-903, Brazil
* Corresponding Author: Matheus Costa Pereira, [email protected]
Received: 12 August 2025, Accepted: 19 October 2025, Published: 15 November 2025  
Abstract
The classification of Capsicum spp. varieties is often hindered by their morphological similarities, making accurate identification a challenging task. To address this issue, this study applies a hybrid computational approach that combines data dimensionality reduction techniques using Principal Component Analysis and Factor Analysis with various supervised Machine Learning algorithms. The dataset, which is unprecedented in the literature and was collected under controlled agricultural conditions, enables a robust evaluation of models including Logistic Regression, Support Vector Machine, K-Nearest Neighbors, Random Forest, Decision Tree, and Gradient Boosting. Model performance was assessed using Leave-One-Out and K-Fold cross-validation methods. Additionally, the SHapley Additive exPlanations method was applied to assess the importance of the features in species classification, providing greater interpretability that reinforces the relevance of morphological and agronomic descriptors in differentiating pepper varieties. The results show that all models achieved high performance metrics, including accuracy, F1-score, precision, and recall, consistently above 0.89, validating the effectiveness of the proposed approach. These findings highlight the potential of integrated Machine Learning frameworks for species classification in agriculture, contributing to practical applications and advancing intelligent analysis of biological data.

Graphical Abstract
Discriminating Planted Capsicum Spp. Varieties via Machine Learning and Multivariate Data Reduction

Keywords
species prediction
pepper
machine learning
classification algorithms
factor analysis

Data Availability Statement
The dataset is available upon request and can also be accessed at the following link: https://github.com/Matheuscp98/PepperCapsicum.

Funding
This work was supported in part by the FAPEMIG under Grant BPD-01045-22; in part by the CAPES; in part by the CNPq; and in part by the NOMATI–UNIFEI, which provided access to laboratories, materials, and technical expertise.

Conflicts of Interest
The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Kim, S., Park, M., Yeom, S. I., Kim, Y. M., Lee, J. M., Lee, H. A., ... & Choi, D. (2014). Genome sequence of the hot pepper provides insights into the evolution of pungency in Capsicum species. Nature genetics, 46(3), 270-278.
    [CrossRef]   [Google Scholar]
  2. Menichini, F., Tundis, R., Bonesi, M., Loizzo, M. R., Conforti, F., Statti, G., ... & Menichini, F. (2009). The influence of fruit ripening on the phytochemical content and biological activity of Capsicum chinense Jacq. cv Habanero. Food Chemistry, 114(2), 553-560.
    [CrossRef]   [Google Scholar]
  3. Batiha, G. E. S., Alqahtani, A., Ojo, O. A., Shaheen, H. M., Wasef, L., Elzeiny, M., ... & Hetta, H. F. (2020). Biological properties, bioactive constituents, and pharmacokinetics of some Capsicum spp. and capsaicinoids. International journal of molecular sciences, 21(15), 5179.
    [CrossRef]   [Google Scholar]
  4. Waqas, M., Naseem, A., Humphries, U. W., Hlaing, P. T., Dechpichai, P., & Wangwongchai, A. (2025). Applications of machine learning and deep learning in agriculture: A comprehensive review. Green Technologies and Sustainability, 100199.
    [CrossRef]   [Google Scholar]
  5. Botero-Valencia, J., García-Pineda, V., Valencia-Arias, A., Valencia, J., Reyes-Vera, E., Mejia-Herrera, M., & Hernández-García, R. (2025). Machine learning in sustainable agriculture: systematic review and research perspectives. Agriculture, 15(4), 377.
    [CrossRef]   [Google Scholar]
  6. Ramirez-Meraz, M., Méndez-Aguilar, R., Hidalgo-Martinez, D., Villa-Ruano, N., Zepeda-Vallejo, L. G., Vallejo-Contreras, F., ... & Becerra-Martínez, E. (2020). Experimental races of Capsicum annuum cv. jalapeno: Chemical characterization and classification by 1H NMR/machine learning. Food research international, 138, 109763.
    [CrossRef]   [Google Scholar]
  7. Durmuş, Y., & Atasoy, A. F. (2023). Application of multivariate machine learning methods to investigate organic compound content of different pepper spices. Food Bioscience, 51, 102216.
    [CrossRef]   [Google Scholar]
  8. Hafsah, S., Surya, M. I., & Syukur, M. (2024). Classification of IPB variety of cayenne pepper genotypes using physical characteristics during the growing period until harvest using machine learning. Future Foods, 10, 100500.
    [CrossRef]   [Google Scholar]
  9. Meena, D., Chakraborty, S., & Mitra, J. (2024). Geographical origin identification of red chili powder using NIR spectroscopy combined with SIMCA and machine learning algorithms. Food Analytical Methods, 17(7), 1005-1023.
    [CrossRef]   [Google Scholar]
  10. Abubeker, K. M., Akhil, S., Kumar, V. A., & Jose, B. K. (2024). Computer Vision-Assisted Real-Time Bird Eye Chili Classification Using YOLO V5 Framework. Journal of Artificial Intelligence and Technology, 4(3), 265-271.
    [CrossRef]   [Google Scholar]
  11. Houetohossou, S. C. A., Hounmenou, C. G., Houndji, V. R., & Glèlè Kakaï, R. (2024, July). Empirical Performance of Deep Learning Models with Class Imbalance for Crop Disease Classification. In International Conference on Deep Learning Theory and Applications (pp. 118-135). Cham: Springer Nature Switzerland.
    [CrossRef]   [Google Scholar]
  12. Djoulde, K., Ousman, B., Hamadjam, A., Bitjoka, L., & Tchiegang, C. (2024). Classification of pepper seeds by machine learning using color filter array images. Journal of Imaging, 10(2), 41.
    [CrossRef]   [Google Scholar]
  13. Jeong, S., Kim, Y. K., Hur, S. H., Bang, H., Kim, H., & Chung, H. (2024). Explainable extreme gradient boosting as a machine learning tool for discrimination of the geographical origin of chili peppers using laser ablation-inductively coupled plasma mass spectrometry, X-ray fluorescence, and near-infrared spectroscopy. Journal of Agriculture and Food Research, 18, 101446.
    [CrossRef]   [Google Scholar]
  14. Karadağ, K., Tenekeci, M. E., Taşaltın, R., & Bilgili, A. (2020). Detection of pepper fusarium disease using machine learning algorithms based on spectral reflectance. Sustainable Computing: Informatics and Systems, 28, 100299.
    [CrossRef]   [Google Scholar]
  15. Bhagat, M., Kumar, D., & Kumar, S. (2023). Bell pepper leaf disease classification with LBP and VGG-16 based fused features and RF classifier. International journal of information technology, 15(1), 465-475.
    [CrossRef]   [Google Scholar]
  16. Wolpert, D. H. (2002). The supervised learning no-free-lunch theorems. Soft computing and industry: Recent applications, 25-42.
    [CrossRef]   [Google Scholar]
  17. Thul, S. T., Lal, R. K., Shasany, A. K., Darokar, M. P., Gupta, A. K., Gupta, M. M., ... & Khanuja, S. P. S. (2009). Estimation of phenotypic divergence in a collection of Capsicum species for yield-related traits. Euphytica, 168(2), 189-196.
    [CrossRef]   [Google Scholar]
  18. Ribeiro, C. S., Soares, R. S., de Carvalho, S. I., Nass, L. L., Lopes, C. A., Lima, M. F., ... & Reifschneider, F. J. (2024). BRS Araçari e BRS Biguatinga: Novas cultivares de pimenta habanero da Embrapa Hortaliças. Horticultura Brasileira, 42, e280540.
    [CrossRef]   [Google Scholar]
  19. Carrizo García, C., Barfuss, M. H., Sehr, E. M., Barboza, G. E., Samuel, R., Moscone, E. A., & Ehrendorfer, F. (2016). Phylogenetic relationships, diversification and expansion of chili peppers (Capsicum, Solanaceae). Annals of botany, 118(1), 35-51.
    [CrossRef]   [Google Scholar]
  20. Sosa-Herrera, J. A., Alvarez-Jarquin, N., Cid-Garcia, N. M., López-Araujo, D. J., & Vallejo-Pérez, M. R. (2022). Automated health estimation of capsicum annuum L. crops by means of deep learning and RGB aerial images. Remote Sensing, 14(19), 4943.
    [CrossRef]   [Google Scholar]
  21. Cruz, R. P. D., Federizzi, L. C., & Milach, S. C. K. (1998). A apomixia no melhoramento de plantas. Ciência rural, 28, 155-161.
    [CrossRef]   [Google Scholar]
  22. Ren, R., Zhang, S., Sun, H., & Gao, T. (2021). Research on pepper external quality detection based on transfer learning integrated with convolutional neural network. Sensors, 21(16), 5305.
    [CrossRef]   [Google Scholar]
  23. Brzozowski, L., & Mazourek, M. (2018). A sustainable agricultural future relies on the transition to organic agroecological pest management. Sustainability, 10(6), 2023.
    [CrossRef]   [Google Scholar]
  24. Gaudêncio, J. H. D., de Almeida, F. A., Turrioni, J. B., da Costa Quinino, R., Balestrassi, P. P., & de Paiva, A. P. (2019). A multiobjective optimization model for machining quality in the AISI 12L14 steel turning process using fuzzy multivariate mean square error. Precision Engineering, 56, 303-320.
    [CrossRef]   [Google Scholar]
  25. Teodoro, L. P. R., Silva, M. O., dos Santos, R. G., de Alcântara, J. F., Coradi, P. C., Biduski, B., ... & Teodoro, P. E. (2024). Machine learning for classification of soybean populations for industrial technological variables based on agronomic traits. Euphytica, 220(3), 40.
    [CrossRef]   [Google Scholar]
  26. Shahbeig, H., & Nosrati, M. (2020). Pyrolysis of biological wastes for bioenergy production: Thermo-kinetic studies with machine-learning method and Py-GC/MS analysis. Fuel, 269, 117238.
    [CrossRef]   [Google Scholar]
  27. SP, S. P., Swaminathan, G., & Joshi, V. V. (2020). Energy conservation–A novel approach of co-combustion of paint sludge and Australian lignite by principal component analysis, response surface methodology and artificial neural network modeling. Environmental Technology & Innovation, 20, 101061.
    [CrossRef]   [Google Scholar]
  28. Xin, X., Pang, S., Mercader, F. M., & Torr, K. M. (2019). The effect of biomass pretreatment on catalytic pyrolysis products of pine wood by Py-GC/MS and principal component analysis. Journal of Analytical and Applied Pyrolysis, 138, 145-153.
    [CrossRef]   [Google Scholar]
  29. Alqahtani, S., & Echekki, T. (2021). A data-based hybrid model for complex fuel chemistry acceleration at high temperatures. Combustion and Flame, 223, 142-152.
    [CrossRef]   [Google Scholar]
  30. de Freitas Gomes, J. H., Salgado Junior, A. R., de Paiva, A. P., Ferreira, J. R., da Costa, S. C., & Balestrassi, P. P. (2012). Global Criterion Method Based on Principal Components to the Optimization of Manufacturing Processes with Multiple Responses. Journal of Mechanical Engineering/Strojniški Vestnik, 58(5).
    [CrossRef]   [Google Scholar]
  31. Naves, F. L., de Paula, T. I., Balestrassi, P. P., Braga, W. L. M., Sawhney, R. S., & de Paiva, A. P. (2017). Multivariate normal boundary intersection based on rotated factor scores: a multiobjective optimization method for methyl orange treatment. Journal of Cleaner Production, 143, 413-439.
    [CrossRef]   [Google Scholar]
  32. Asimakopoulos, D. N., Bougiatioti, A., Maggos, T., Vasilakos, C., & Mihalopoulos, N. (2014). Assessment of PM2. 5 and PM1 chemical profile in a multiple-impacted Mediterranean urban area: Origin, sources and meteorological dependence. Science of the Total Environment, 479, 210-220.
    [CrossRef]   [Google Scholar]
  33. Liu, M., Fan, D., Bi, N., Sun, X., & Yang, Z. (2019). Impact of water-sediment regulation on the transport of heavy metals from the Yellow River to the sea in 2015. Science of The Total Environment, 658, 268-279.
    [CrossRef]   [Google Scholar]
  34. Townsend, J., Evans, B., & Tudor, T. (2016). Aerodynamic optimisation of the rear wheel fairing of the land speed record vehicle BLOODHOUND SSC. The Aeronautical Journal, 120(1228), 930-955.
    [CrossRef]   [Google Scholar]
  35. Pradhan, B., & Lee, S. (2010). Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environmental Modelling & Software, 25(6), 747-759.
    [CrossRef]   [Google Scholar]
  36. Vapnik, V. (2013). The nature of statistical learning theory. Springer science & business media.
    [Google Scholar]
  37. Ma, Y., Hou, Y., Liu, Y., & Xue, Y. (2016, March). Research of food safety risk assessment methods based on big data. In 2016 IEEE International Conference on Big Data Analysis (ICBDA) (pp. 1-5). IEEE.
    [CrossRef]   [Google Scholar]
  38. Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28.
    [CrossRef]   [Google Scholar]
  39. Xu, X., Xiao, C., Dong, Y., Zhan, L., Bi, R., Song, M., ... & Xiong, Z. (2024). Machine learning algorithms realized soil stoichiometry prediction and its driver identification in intensive agroecosystems across a north-south transect of eastern China. Science of the Total Environment, 906, 167488.
    [CrossRef]   [Google Scholar]
  40. Papandrea, P. J., Frigieri, E. P., Maia, P. R., Oliveira, L. G., & Paiva, A. P. (2020). Surface roughness diagnosis in hard turning using acoustic signals and support vector machine: A PCA-based approach. Applied Acoustics, 159, 107102.
    [CrossRef]   [Google Scholar]
  41. Çetin, N., Ozaktan, H., Uzun, S., Uzun, O., & Ciftci, C. Y. (2023). Machine learning based mass prediction and discrimination of chickpea (Cicer arietinum L.) cultivars. Euphytica, 219(1), 20.
    [CrossRef]   [Google Scholar]
  42. Sappl, J., Harders, M., & Rauch, W. (2023). Machine learning for quantile regression of biogas production rates in anaerobic digesters. Science of The Total Environment, 872, 161923.
    [CrossRef]   [Google Scholar]
  43. Zhang, Z. (2016). Introduction to machine learning: k-nearest neighbors. Annals of Translational Medicine, 4(11), 218.
    [CrossRef]   [Google Scholar]
  44. Martinez-Sanchez, L., See, L., Yordanov, M., Juan, P. D., Lesiv, M., & McCallum, I. (2024). Automatic classification of land cover from LUCAS in-situ landscape photos using semantic segmentation and a Random Forest model. Environmental Modelling & Software, 172, 105931.
    [CrossRef]   [Google Scholar]
  45. Chen, J., Zhu, S., Wang, P., Zhang, Y., Liu, Y., & Li, W. (2024). Predicting particulate matter, nitrogen dioxide, and ozone across Great Britain with high spatiotemporal resolution based on random forest models. Science of The Total Environment, 926, 171831.
    [CrossRef]   [Google Scholar]
  46. Davenport, G., Ellis, N., Ambrose, M., & Dicks, J. (2004). Using bioinformatics to analyse germplasm collections. Euphytica, 137(1), 39-54.
    [CrossRef]   [Google Scholar]
  47. Alawee, W. H., Al-Haddad, L. A., Basem, A., Jasim, D. J., Majdi, H. S., & Sultan, A. J. (2024). Forecasting sustainable water production in convex tubular solar stills using gradient boosting analysis. Desalination and Water Treatment, 318, 100344.
    [CrossRef]   [Google Scholar]
  48. Lee, H. P., Li, Y., Song, L., Wu, D., & Lu, N. (2024). An iterative bidirectional gradient boosting approach for CVR baseline estimation. Applied Energy, 369, 123456.
    [CrossRef]   [Google Scholar]
  49. Lephalala, M., Vives, S. S., & Bisetty, K. (2024). Chaotic neural network algorithm with competitive learning integrated with partial Least Square models for the prediction of the toxicity of fragrances in sanitizers and disinfectants. Science of The Total Environment, 942, 173754.
    [CrossRef]   [Google Scholar]
  50. De Meester, J., & Willems, P. (2024). Assessing the power of non-parametric data-driven approaches to analyse the impact of drought measures. Environmental Modelling & Software, 172, 105923.
    [CrossRef]   [Google Scholar]
  51. Schleier, J. J., Peterson, R. K. D., Irvine, K. M., Marshall, L. M., & Preftakes, C. J. (2012). Environmental fate model for ultra-low-volume insecticide applications used for adult mosquito management. Science of The Total Environment, 438, 72-79.
    [CrossRef]   [Google Scholar]
  52. Vanacore, A., Pellegrino, M. S., & Ciardiello, A. (2024). Fair evaluation of classifier predictive performance based on binary confusion matrix. Computational Statistics, 39(1), 363-383.
    [CrossRef]   [Google Scholar]
  53. Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1), 6.
    [CrossRef]   [Google Scholar]
  54. Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756.
    [Google Scholar]
  55. Vasconcelos, G. A. V. B., Francisco, M. B., da Costa, L. R. A., Ribeiro Junior, R. F., & Melo, M. D. L. N. M. (2024). Prediction of surface roughness in duplex stainless steel face milling using artificial neural network. The International Journal of Advanced Manufacturing Technology, 133(5), 2031-2048.
    [CrossRef]   [Google Scholar]
  56. International Plant Genetic Resources Institute, Asian Vegetable Research, Development Center, & Centro Agronómico Tropical de Investigación y Enseñanza. (1995). Descriptors for Capsicum (Capsicum spp.). Bioversity International.
    [Google Scholar]
  57. Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23(3), 187-200.
    [CrossRef]   [Google Scholar]
  58. Zhang, F., Yin, J., Wu, N., Hu, X., Sun, S., & Wang, Y. (2024). A dual-path model merging CNN and RNN with attention mechanism for crop classification. European Journal of Agronomy, 159, 127273.
    [CrossRef]   [Google Scholar]
  59. Zhao, G., Zhao, Q., Webber, H., Hoffmann, H., Junker, L. V., Rezaei, E. E., ... & Ewert, F. (2024). Integrating machine learning and change detection for enhanced crop disease forecasting in rice farming: A multi-regional study. European Journal of Agronomy, 160, 127317.
    [CrossRef]   [Google Scholar]
  60. Li, Y., Feng, Q., Liu, C., Wang, Y., & Li, J. (2023). MTA-YOLACT: Multitask-aware network on fruit bunch identification for cherry tomato robotic harvesting. European Journal of Agronomy, 146, 126812.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Pereira, M. S., de Azevedo, T. M., Ribeiro, C. T., Freire, A. I., Francisco, M. B., Pereira, J. L. J., & de Paiva, A. P. (2025). Discriminating Planted Capsicum Spp. Varieties via Machine Learning and Multivariate Data Reduction. ICCK Transactions on Machine Intelligence, 1(3), 166–185. https://doi.org/10.62762/TMI.2025.385133

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 75
PDF Downloads: 36

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
ICCK Transactions on Machine Intelligence

ICCK Transactions on Machine Intelligence

ISSN: 3068-7403 (Online)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/