Diabetic Retinopathy Detection and Analysis with Convolutional Neural Networks and Vision Transformer
Article Information
Abstract
Diabetic Retinopathy occurs when elevated blood sugar levels damage retinal blood vessels, potentially leading to vision impairment. In this paper, we have tested the performance of CNN, ViT and their hybrid models. The dataset used is publicly available on Kaggle and the dataset contained around 35,000 retinal images which were divided into 5 classes namely No DR, Mild DR, Moderate DR, Severe DR and Proliferative DR. In CNN we tested 4 different architectures in which we achieved the best accuracy of 75.4% with Resnet50 architecture and with ViT model we achieved an accuracy of 83.9% and from the hybrid model we achieved an accuracy of 88.4% from the Resnet50 + ViT. The results shown by the models were promising but there were some gaps in the study. The dataset used was skewed towards NO DR class. For future work more balanced datasets with some data augmentation techniques could be used. Additionally, the study used only 50 epochs which can be increased in future work to use the model to their full potential.
Graphical Abstract
Keywords
Data Availability Statement
Funding
Conflicts of Interest
Ethical Approval and Consent to Participate
References
- Abràmoff, M. D., Lavin, P. T., Birch, M., Shah, N., & Folk, J. C. (2018). Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ digital medicine, 1(1), 39.
[CrossRef] [Google Scholar] - Ting, D. S. W., Cheung, G. C. M., & Wong, T. Y. (2016). Diabetic retinopathy: global prevalence, major risk factors, screening practices and public health challenges: a review. Clinical & experimental ophthalmology, 44(4), 260-277.
[CrossRef] [Google Scholar] - S.D. Karthik Maggie. APTOS 2019 Blindness Detection. Kaggle (2019).
[Google Scholar] - Agarwal, R. (2023, November). Diabetic retinopathy segmentation in IDRiD using enhanced U-Net. In 2023 International Conference on Ambient Intelligence, Knowledge Informatics and Industrial Electronics (AIKIIE) (pp. 1-6). IEEE.
[CrossRef] [Google Scholar] - Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. https://arxiv.org/pdf/2010.11929/1000
[Google Scholar] - Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., ... & Webster, D. R. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. jama, 316(22), 2402-2410.
[CrossRef] [Google Scholar] - Zhai, X., Kolesnikov, A., Houlsby, N., & Beyer, L. (2022). Scaling vision transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12104-12113).
[CrossRef] [Google Scholar] - Kobat, S. G., Baygin, N., Yusufoglu, E., Baygin, M., Barua, P. D., Dogan, S., ... & Acharya, U. R. (2022). Automated diabetic retinopathy detection using horizontal and vertical patch division-based pre-trained DenseNET with digital fundus images. Diagnostics, 12(8), 1975.
[CrossRef] [Google Scholar] - Tanlikesmath. Diabetic Retinopathy Detection Competition Dataset Resized/Cropped (2019).
[Google Scholar] - Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
[Google Scholar] - Li, T., Gao, Y., Wang, K., Guo, S., Liu, H., & Kang, H. (2019). Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Information Sciences, 501, 511-522.
[CrossRef] [Google Scholar] - Staal, J., Abràmoff, M. D., Niemeijer, M., Viergever, M. A., & Van Ginneken, B. (2004). Ridge-based vessel segmentation in color images of the retina. IEEE transactions on medical imaging, 23(4), 501-509.
[CrossRef] [Google Scholar] - Quellec, G., Charriere, K., Boudi, Y., Cochener, B., & Lamard, M. (2017). Deep image mining for diabetic retinopathy screening. Medical image analysis, 39, 178-193.
[CrossRef] [Google Scholar] - Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017, February). Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1).
[CrossRef] [Google Scholar] - Decencière, E., Zhang, X., Cazuguel, G., Lay, B., Cochener, B., Trone, C., ... & Klein, J. C. (2014). Feedback on a publicly distributed image database: the Messidor database. Image Analysis & Stereology, 231-234. https://dx.doi.org/10.5566/ias.1155
[Google Scholar] - Sugeno, A., Ishikawa, Y., Ohshima, T., & Muramatsu, R. (2021). Simple methods for the lesion detection and severity grading of diabetic retinopathy by image processing and transfer learning. Computers in biology and medicine, 137, 104795.
[CrossRef] [Google Scholar] - Usman, T. M., Saheed, Y. K., Ignace, D., & Nsang, A. (2023). Diabetic retinopathy detection using principal component analysis multi-label feature extraction and classification. International Journal of Cognitive Computing in Engineering, 4, 78-88.
[CrossRef] [Google Scholar] - Willis, J. R., Doan, Q. V., Gleeson, M., Haskova, Z., Ramulu, P., Morse, L., & Cantrell, R. A. (2017). Vision-related functional burden of diabetic retinopathy across severity levels in the United States. JAMA ophthalmology, 135(9), 926-932.
[CrossRef] [Google Scholar] - Porwal, P., Pachade, S., Kamble, R., Kokare, M., Deshmukh, G., Sahasrabuddhe, V., & Meriaudeau, F. (2018). Indian diabetic retinopathy image dataset (IDRiD): a database for diabetic retinopathy screening research. Data, 3(3), 25.
[CrossRef] [Google Scholar] - Wu, Y., Xia, Y., Song, Y., Zhang, Y., & Cai, W. (2020). NFN+: A novel network followed network for retinal vessel segmentation. Neural Networks, 126, 153-162.
[CrossRef] [Google Scholar] - Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132-7141).
[CrossRef] [Google Scholar] - Song, J., Zheng, Y., Wang, J., Zakir Ullah, M., & Jiao, W. (2021). Multicolor image classification using the multimodal information bottleneck network (MMIB-Net) for detecting diabetic retinopathy. Optics Express, 29(14), 22732-22748.
[CrossRef] [Google Scholar] - Li, X., Hu, X., Yu, L., Zhu, L., Fu, C. W., & Heng, P. A. (2019). CANet: cross-disease attention network for joint diabetic retinopathy and diabetic macular edema grading. IEEE transactions on medical imaging, 39(5), 1483-1493.
[CrossRef] [Google Scholar] - Mo, J., Zhang, L., & Feng, Y. (2018). Exudate-based diabetic macular edema recognition in retinal images using cascaded deep residual networks. Neurocomputing, 290, 161-171.
[CrossRef] [Google Scholar]
Cited By (5)
-
Vanajaroselin Chirchi, Emmanvelraj Chirchi, Khushi E Chirchi. .
2026 13th International Conference on Computing for Sustainable Global Development (INDIACom), 2026 .
[CrossRef] -
B. Chitradevi, P. Mathiyalagan, A. Ramachandran, R. Dhanapal, K. Sheikdavood, S. Gnanamurugan. Conv-ViT: An improved discrete convolution-based vision transformer for diabetic retinopathy detection.
Franklin Open, 2026 , 14 .
[CrossRef] -
Hongjuan Wang, Chenxi Wang, Xinjun An. DKTransformer: An Accurate and Efficient Model for Fine-Grained Food Image Classification.
Sensors, 2026 , 26 (4).
[CrossRef] -
Venkatalakshmi S, Gurram Sunitha. .
2026 Contemporary Computing Innovations Conference (CCIC), 2026 .
[CrossRef] -
Abdullah Al Noman, Chanda Rani Debi, Md Anamul Haque, Asma Akter Mukta, Fahmida Rahman, Md. Khaliluzzaman. .
2025 IEEE International Conference on Biomedical Engineering, Computer and Information Technology for Health (BECITHCON), 2025 .
[CrossRef]
Cite This Article
TY - JOUR AU - Tewari, Yogesh AU - Parihar, Nitin Singh AU - Rautela, Karan AU - Kaundal, Nishant AU - Diwakar, Manoj AU - Pandey, Neeraj Kumar PY - 2025 DA - 2025/06/03 TI - Diabetic Retinopathy Detection and Analysis with Convolutional Neural Networks and Vision Transformer JO - Biomedical Informatics and Smart Healthcare T2 - Biomedical Informatics and Smart Healthcare JF - Biomedical Informatics and Smart Healthcare VL - 1 IS - 1 SP - 18 EP - 26 DO - 10.62762/BISH.2025.724307 UR - https://www.icck.org/article/abs/BISH.2025.724307 KW - diabetic retinopathy KW - CNN KW - ViT KW - deep learning KW - image classification AB - Diabetic Retinopathy occurs when elevated blood sugar levels damage retinal blood vessels, potentially leading to vision impairment. In this paper, we have tested the performance of CNN, ViT and their hybrid models. The dataset used is publicly available on Kaggle and the dataset contained around 35,000 retinal images which were divided into 5 classes namely No DR, Mild DR, Moderate DR, Severe DR and Proliferative DR. In CNN we tested 4 different architectures in which we achieved the best accuracy of 75.4% with Resnet50 architecture and with ViT model we achieved an accuracy of 83.9% and from the hybrid model we achieved an accuracy of 88.4% from the Resnet50 + ViT. The results shown by the models were promising but there were some gaps in the study. The dataset used was skewed towards NO DR class. For future work more balanced datasets with some data augmentation techniques could be used. Additionally, the study used only 50 epochs which can be increased in future work to use the model to their full potential. SN - 3068-5524 PB - Institute of Central Computation and Knowledge LA - English ER -
@article{Tewari2025Diabetic,
author = {Yogesh Tewari and Nitin Singh Parihar and Karan Rautela and Nishant Kaundal and Manoj Diwakar and Neeraj Kumar Pandey},
title = {Diabetic Retinopathy Detection and Analysis with Convolutional Neural Networks and Vision Transformer},
journal = {Biomedical Informatics and Smart Healthcare},
year = {2025},
volume = {1},
number = {1},
pages = {18-26},
doi = {10.62762/BISH.2025.724307},
url = {https://www.icck.org/article/abs/BISH.2025.724307},
abstract = {Diabetic Retinopathy occurs when elevated blood sugar levels damage retinal blood vessels, potentially leading to vision impairment. In this paper, we have tested the performance of CNN, ViT and their hybrid models. The dataset used is publicly available on Kaggle and the dataset contained around 35,000 retinal images which were divided into 5 classes namely No DR, Mild DR, Moderate DR, Severe DR and Proliferative DR. In CNN we tested 4 different architectures in which we achieved the best accuracy of 75.4\% with Resnet50 architecture and with ViT model we achieved an accuracy of 83.9\% and from the hybrid model we achieved an accuracy of 88.4\% from the Resnet50 + ViT. The results shown by the models were promising but there were some gaps in the study. The dataset used was skewed towards NO DR class. For future work more balanced datasets with some data augmentation techniques could be used. Additionally, the study used only 50 epochs which can be increased in future work to use the model to their full potential.},
keywords = {diabetic retinopathy, CNN, ViT, deep learning, image classification},
issn = {3068-5524},
publisher = {Institute of Central Computation and Knowledge}
}
Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and Permissions
Copyright © 2025 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
Portico