Diabetic Retinopathy Detection and Analysis with Convolutional Neural Networks and Vision Transformer
Research Article  ·  Published: 03 June 2025
Issue cover
Biomedical Informatics and Smart Healthcare
Volume 1, Issue 1, 2025: 18-26
Research Article Open Access

Diabetic Retinopathy Detection and Analysis with Convolutional Neural Networks and Vision Transformer

1 Department of Computer Science and Engineering, Graphic Era Deemed to be University, Dehradun 248002, India
* Corresponding Author: Manoj Diwakar, [email protected]
Volume 1, Issue 1

Article Information

Abstract

Diabetic Retinopathy occurs when elevated blood sugar levels damage retinal blood vessels, potentially leading to vision impairment. In this paper, we have tested the performance of CNN, ViT and their hybrid models. The dataset used is publicly available on Kaggle and the dataset contained around 35,000 retinal images which were divided into 5 classes namely No DR, Mild DR, Moderate DR, Severe DR and Proliferative DR. In CNN we tested 4 different architectures in which we achieved the best accuracy of 75.4% with Resnet50 architecture and with ViT model we achieved an accuracy of 83.9% and from the hybrid model we achieved an accuracy of 88.4% from the Resnet50 + ViT. The results shown by the models were promising but there were some gaps in the study. The dataset used was skewed towards NO DR class. For future work more balanced datasets with some data augmentation techniques could be used. Additionally, the study used only 50 epochs which can be increased in future work to use the model to their full potential.

Graphical Abstract

Diabetic Retinopathy Detection and Analysis with Convolutional Neural Networks and Vision Transformer

Keywords

diabetic retinopathy CNN ViT deep learning image classification

Data Availability Statement

Data will be made available on request.

Funding

This work was supported without any funding.

Conflicts of Interest

The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

  1. Abràmoff, M. D., Lavin, P. T., Birch, M., Shah, N., & Folk, J. C. (2018). Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ digital medicine, 1(1), 39.
    [CrossRef] [Google Scholar]
  2. Ting, D. S. W., Cheung, G. C. M., & Wong, T. Y. (2016). Diabetic retinopathy: global prevalence, major risk factors, screening practices and public health challenges: a review. Clinical & experimental ophthalmology, 44(4), 260-277.
    [CrossRef] [Google Scholar]
  3. S.D. Karthik Maggie. APTOS 2019 Blindness Detection. Kaggle (2019).
    [Google Scholar]
  4. Agarwal, R. (2023, November). Diabetic retinopathy segmentation in IDRiD using enhanced U-Net. In 2023 International Conference on Ambient Intelligence, Knowledge Informatics and Industrial Electronics (AIKIIE) (pp. 1-6). IEEE.
    [CrossRef] [Google Scholar]
  5. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. https://arxiv.org/pdf/2010.11929/1000
    [Google Scholar]
  6. Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., ... & Webster, D. R. (2016). Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. jama, 316(22), 2402-2410.
    [CrossRef] [Google Scholar]
  7. Zhai, X., Kolesnikov, A., Houlsby, N., & Beyer, L. (2022). Scaling vision transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12104-12113).
    [CrossRef] [Google Scholar]
  8. Kobat, S. G., Baygin, N., Yusufoglu, E., Baygin, M., Barua, P. D., Dogan, S., ... & Acharya, U. R. (2022). Automated diabetic retinopathy detection using horizontal and vertical patch division-based pre-trained DenseNET with digital fundus images. Diagnostics, 12(8), 1975.
    [CrossRef] [Google Scholar]
  9. Tanlikesmath. Diabetic Retinopathy Detection Competition Dataset Resized/Cropped (2019).
    [Google Scholar]
  10. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
    [Google Scholar]
  11. Li, T., Gao, Y., Wang, K., Guo, S., Liu, H., & Kang, H. (2019). Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Information Sciences, 501, 511-522.
    [CrossRef] [Google Scholar]
  12. Staal, J., Abràmoff, M. D., Niemeijer, M., Viergever, M. A., & Van Ginneken, B. (2004). Ridge-based vessel segmentation in color images of the retina. IEEE transactions on medical imaging, 23(4), 501-509.
    [CrossRef] [Google Scholar]
  13. Quellec, G., Charriere, K., Boudi, Y., Cochener, B., & Lamard, M. (2017). Deep image mining for diabetic retinopathy screening. Medical image analysis, 39, 178-193.
    [CrossRef] [Google Scholar]
  14. Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017, February). Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1).
    [CrossRef] [Google Scholar]
  15. Decencière, E., Zhang, X., Cazuguel, G., Lay, B., Cochener, B., Trone, C., ... & Klein, J. C. (2014). Feedback on a publicly distributed image database: the Messidor database. Image Analysis & Stereology, 231-234. https://dx.doi.org/10.5566/ias.1155
    [Google Scholar]
  16. Sugeno, A., Ishikawa, Y., Ohshima, T., & Muramatsu, R. (2021). Simple methods for the lesion detection and severity grading of diabetic retinopathy by image processing and transfer learning. Computers in biology and medicine, 137, 104795.
    [CrossRef] [Google Scholar]
  17. Usman, T. M., Saheed, Y. K., Ignace, D., & Nsang, A. (2023). Diabetic retinopathy detection using principal component analysis multi-label feature extraction and classification. International Journal of Cognitive Computing in Engineering, 4, 78-88.
    [CrossRef] [Google Scholar]
  18. Willis, J. R., Doan, Q. V., Gleeson, M., Haskova, Z., Ramulu, P., Morse, L., & Cantrell, R. A. (2017). Vision-related functional burden of diabetic retinopathy across severity levels in the United States. JAMA ophthalmology, 135(9), 926-932.
    [CrossRef] [Google Scholar]
  19. Porwal, P., Pachade, S., Kamble, R., Kokare, M., Deshmukh, G., Sahasrabuddhe, V., & Meriaudeau, F. (2018). Indian diabetic retinopathy image dataset (IDRiD): a database for diabetic retinopathy screening research.  Data, 3(3), 25.
    [CrossRef] [Google Scholar]
  20. Wu, Y., Xia, Y., Song, Y., Zhang, Y., & Cai, W. (2020). NFN+: A novel network followed network for retinal vessel segmentation. Neural Networks, 126, 153-162.
    [CrossRef] [Google Scholar]
  21. Hu, J., Shen, L., & Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7132-7141).
    [CrossRef] [Google Scholar]
  22. Song, J., Zheng, Y., Wang, J., Zakir Ullah, M., & Jiao, W. (2021). Multicolor image classification using the multimodal information bottleneck network (MMIB-Net) for detecting diabetic retinopathy. Optics Express, 29(14), 22732-22748.
    [CrossRef] [Google Scholar]
  23. Li, X., Hu, X., Yu, L., Zhu, L., Fu, C. W., & Heng, P. A. (2019). CANet: cross-disease attention network for joint diabetic retinopathy and diabetic macular edema grading. IEEE transactions on medical imaging, 39(5), 1483-1493.
    [CrossRef] [Google Scholar]
  24. Mo, J., Zhang, L., & Feng, Y. (2018). Exudate-based diabetic macular edema recognition in retinal images using cascaded deep residual networks. Neurocomputing, 290, 161-171.
    [CrossRef] [Google Scholar]

Cited By (5)

  1. Vanajaroselin Chirchi, Emmanvelraj Chirchi, Khushi E Chirchi. . 2026 13th International Conference on Computing for Sustainable Global Development (INDIACom), 2026 .
    [CrossRef]
  2. B. Chitradevi, P. Mathiyalagan, A. Ramachandran, R. Dhanapal, K. Sheikdavood, S. Gnanamurugan. Conv-ViT: An improved discrete convolution-based vision transformer for diabetic retinopathy detection. Franklin Open, 2026 , 14 .
    [CrossRef]
  3. Hongjuan Wang, Chenxi Wang, Xinjun An. DKTransformer: An Accurate and Efficient Model for Fine-Grained Food Image Classification. Sensors, 2026 , 26 (4).
    [CrossRef]
  4. Venkatalakshmi S, Gurram Sunitha. . 2026 Contemporary Computing Innovations Conference (CCIC), 2026 .
    [CrossRef]
  5. Abdullah Al Noman, Chanda Rani Debi, Md Anamul Haque, Asma Akter Mukta, Fahmida Rahman, Md. Khaliluzzaman. . 2025 IEEE International Conference on Biomedical Engineering, Computer and Information Technology for Health (BECITHCON), 2025 .
    [CrossRef]
* Citation data provided by Crossref Cited-by.

Cite This Article

APA Style
Tewari, Y., Parihar, N. S., Rautela, K., Kaundal, N., Diwakar, M., & Pandey, N. K. (2025). Diabetic Retinopathy Detection and Analysis with Convolutional Neural Networks and Vision Transformer. Biomedical Informatics and Smart Healthcare, 1(1), 18–26. https://doi.org/10.62762/BISH.2025.724307
Export Citation
RIS Format
Compatible with EndNote, Zotero, Mendeley, and other reference managers
TY  - JOUR
AU  - Tewari, Yogesh
AU  - Parihar, Nitin Singh
AU  - Rautela, Karan
AU  - Kaundal, Nishant
AU  - Diwakar, Manoj
AU  - Pandey, Neeraj Kumar
PY  - 2025
DA  - 2025/06/03
TI  - Diabetic Retinopathy Detection and Analysis with Convolutional Neural Networks and Vision Transformer
JO  - Biomedical Informatics and Smart Healthcare
T2  - Biomedical Informatics and Smart Healthcare
JF  - Biomedical Informatics and Smart Healthcare
VL  - 1
IS  - 1
SP  - 18
EP  - 26
DO  - 10.62762/BISH.2025.724307
UR  - https://www.icck.org/article/abs/BISH.2025.724307
KW  - diabetic retinopathy
KW  - CNN
KW  - ViT
KW  - deep learning
KW  - image classification
AB  - Diabetic Retinopathy occurs when elevated blood sugar levels damage retinal blood vessels, potentially leading to vision impairment. In this paper, we have tested the performance of CNN, ViT and their hybrid models. The dataset used is publicly available on Kaggle and the dataset contained around 35,000 retinal images which were divided into 5 classes namely No DR, Mild DR, Moderate DR, Severe DR and Proliferative DR. In CNN we tested 4 different architectures in which we achieved the best accuracy of 75.4% with Resnet50 architecture and with ViT model we achieved an accuracy of 83.9% and from the hybrid model we achieved an accuracy of 88.4% from the Resnet50 + ViT. The results shown by the models were promising but there were some gaps in the study. The dataset used was skewed towards NO DR class. For future work more balanced datasets with some data augmentation techniques could be used. Additionally, the study used only 50 epochs which can be increased in future work to use the model to their full potential.
SN  - 3068-5524
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  - 
BibTeX Format
Compatible with LaTeX, BibTeX, and other reference managers
@article{Tewari2025Diabetic,
  author = {Yogesh Tewari and Nitin Singh Parihar and Karan Rautela and Nishant Kaundal and Manoj Diwakar and Neeraj Kumar Pandey},
  title = {Diabetic Retinopathy Detection and Analysis with Convolutional Neural Networks and Vision Transformer},
  journal = {Biomedical Informatics and Smart Healthcare},
  year = {2025},
  volume = {1},
  number = {1},
  pages = {18-26},
  doi = {10.62762/BISH.2025.724307},
  url = {https://www.icck.org/article/abs/BISH.2025.724307},
  abstract = {Diabetic Retinopathy occurs when elevated blood sugar levels damage retinal blood vessels, potentially leading to vision impairment. In this paper, we have tested the performance of CNN, ViT and their hybrid models. The dataset used is publicly available on Kaggle and the dataset contained around 35,000 retinal images which were divided into 5 classes namely No DR, Mild DR, Moderate DR, Severe DR and Proliferative DR. In CNN we tested 4 different architectures in which we achieved the best accuracy of 75.4\% with Resnet50 architecture and with ViT model we achieved an accuracy of 83.9\% and from the hybrid model we achieved an accuracy of 88.4\% from the Resnet50 + ViT. The results shown by the models were promising but there were some gaps in the study. The dataset used was skewed towards NO DR class. For future work more balanced datasets with some data augmentation techniques could be used. Additionally, the study used only 50 epochs which can be increased in future work to use the model to their full potential.},
  keywords = {diabetic retinopathy, CNN, ViT, deep learning, image classification},
  issn = {3068-5524},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics

Citations
Views
5146
PDF Downloads
2698

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

CC BY Copyright © 2025 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
Biomedical Informatics and Smart Healthcare
Biomedical Informatics and Smart Healthcare
ISSN: 3068-5524 (Online)
Portico
Preserved at
Portico