-
CiteScore
-
Impact Factor
Volume 1, Issue 2, ICCK Transactions on Machine Intelligence
Volume 1, Issue 2, 2025
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Transactions on Machine Intelligence, Volume 1, Issue 2, 2025: 52-68

Free to Read | Research Article | 01 August 2025
A Novel Image Captioning Technique Using Deep Learning Methodology
1 Department of the AIML-CSE Apex Institute of Technology, Chandigarh University, Mohali, India
* Corresponding Author: Jaswinder Singh, [email protected]
Received: 11 March 2025, Accepted: 23 May 2025, Published: 01 August 2025  
Abstract
The capacity of robots to produce captions for images independently is a big step forward in the field of artificial intelligence and language understanding. This paper looks at an advanced picture captioning system that uses deep learning techniques, notably convolutional neural networks (CNNs) and recurrent neural networks (RNNs), to provide contextually appropriate and meaningful descriptions of visual content. The suggested technique extracts features using the DenseNet201 model, which allows for a more thorough and hierarchical comprehension of picture components. These collected characteristics are subsequently processed by a long short-term memory (LSTM) network, a specific RNN variation designed to capture sequential dependencies in language, resulting in captions that are coherent and fluent.The model is trained and assessed on the well-known Flickr8k dataset, attaining competitive performance as judged by BLEU score metrics and proving its capacity to provide humanlike descriptions. This use of CNNs and RNNs demonstrates the value of merging computer vision and natural language processing for automated caption development. This approach has the potential to be applied in a range of industries, including assistive technology for the visually impaired, automated content production for digital media, enhanced indexing and retrieval of multimedia assets, and improved human-computer interaction. Furthermore, advances in attention processes and transformer-based models offer opportunities to improve the accuracy and contextual relevance of picture captioning models. The study emphasizes machine-generated captions’ larger implications for increasing accessibility, boosting searchability in large-scale databases, and enabling seamless AI-human cooperation in content interpretation and storytelling.

Graphical Abstract
A Novel Image Captioning Technique Using Deep Learning Methodology

Keywords
convolutional neural networks (CNN)
recurrent neural networks (RNN)
deep learning
image captioning
LSTM
DenseNet201
attention mechanism
BLEU scor
natural language processing (NLP)
multimodal learning
content retrieval

Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Aneja, J., Deshpande, A., & Schwing, A. G. (2018). Convolutional image captioning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5561–5570.
    [CrossRef]   [Google Scholar]
  2. Bai, S., & An, S. (2018). A survey on automatic image caption generation. Neurocomputing, 311, 291–304.
    [CrossRef]   [Google Scholar]
  3. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., & Zhang, L. (2017). Bottom-up and top-down attention for image captioning and vqa. arXiv preprint arXiv:1707.07998, 2(4), 8.
    [Google Scholar]
  4. Chen, X., & Zitnick, C. L. (2015). Mind’s eye: A recurrent visual representation for image caption generation. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2422–2431.
    [CrossRef]   [Google Scholar]
  5. Ghandi, T., Pourreza, H., & Mahyar, H. (2023). Deep learning approaches on image captioning: A review. ACM Computing Surveys, 56(3), 1–39.
    [CrossRef]   [Google Scholar]
  6. Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3128–3137.
    [CrossRef]   [Google Scholar]
  7. Chen, L., Wei, X., Li, J., Dong, X., Zhang, P., Zang, Y., ... & Others. (2024). ShareGPT4Video: Improving video understanding and generation with better captions. Advances in Neural Information Processing Systems, 37, 19472–19495.
    [Google Scholar]
  8. Rastogi, R., Rawat, V., & Kaushal, S. (2024). Demonstration and analysing the performance of image caption generator: Efforts for visually impaired candidates for Smart Cities 5.0. International Journal of Advanced Mechatronic Systems, 11(3), 161–178.
    [CrossRef]   [Google Scholar]
  9. Jamil, A., Mahmood, K., Villar, M. G., Prola, T., Diez, I. D. L. T., Samad, M. A., & Ashraf, I. (2024). Deep learning approaches for image captioning: Opportunities, challenges and future potential. IEEE Access, 12, 12345–12367.
    [CrossRef]   [Google Scholar]
  10. Vo-Ho, V. K., Luong, Q. A., Nguyen, D. T., Tran, M. K., & Tran, M. T. (2019). A smart system for text-lifelog generation from wearable cameras in smart environment using concept-augmented image captioning with modified beam search strategy. Applied Sciences, 9(9), 1886.
    [CrossRef]   [Google Scholar]
  11. Cornia, M., Baraldi, L., & Cucchiara, R. (2019). Show, control and tell: A framework for generating controllable and grounded captions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8307–8316.
    [CrossRef]   [Google Scholar]
  12. Chen, L., Jiang, Z., Xiao, J., & Liu, W. (2021). Human-like controllable image captioning with verb-specific semantic roles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16846–16856.
    [CrossRef]   [Google Scholar]
  13. Yang, X., Yang, Y., Ma, S., Li, Z., Dong, W., & Woz´niak, M. (2024). SAMT-generator: A second-attention for image captioning based on multi-stage transformer network. Neurocomputing, 593, 127823.
    [CrossRef]   [Google Scholar]
  14. Zhao, W., Du, S., & Emery, W. J. (2017). Object-based convolutional neural network for high-resolution imagery classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(7), 3386–3396.
    [CrossRef]   [Google Scholar]
  15. Wang, Q., Deng, H., Wu, X., Yang, Z., Liu, Y., Wang, Y., & Hao, G. (2023). LCM-Captioner: A lightweight text-based image captioning method with collaborative mechanism between vision and text. Neural Networks, 162, 318–329.
    [CrossRef]   [Google Scholar]
  16. Nag, I. (2021). Systematic literature review of deep visual and audio captioning (Technical Report No. 133909). Tampere University.
    [Google Scholar]
  17. Li, X., Xu, C., Wang, X., Lan, W., Jia, Z., Yang, G., & Xu, J. (2019). COCO-CN for cross-lingual image tagging, captioning, and retrieval. IEEE Transactions on Multimedia, 21(9), 2347–2360.
    [CrossRef]   [Google Scholar]
  18. Hoxha, G. (2022). Image captioning for remote sensing image analysis [Doctoral dissertation, University of Trento].
    [Google Scholar]
  19. Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2016). Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 652–663.
    [CrossRef]   [Google Scholar]
  20. Jaiswal, S., Pallthadka, H., Chinchewadi, P., & Jaiswal, T. (2023). An extensive analysis of image captioning models, evaluation measures, and datasets. International Journal of Multidisciplinary Science Research Review, 1(1), 21–37.
    [Google Scholar]

Cite This Article
APA Style
Khan, A., & Singh, J. (2025). A Novel Image Captioning Technique Using Deep Learning Methodology. ICCK Transactions on Machine Intelligence, 1(2), 52–68. https://doi.org/10.62762/TMI.2025.886122

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 559
PDF Downloads: 116

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
ICCK Transactions on Machine Intelligence

ICCK Transactions on Machine Intelligence

ISSN: 3068-7403 (Online)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/