Capturing Poetic Essence: Text Summarization and Visual Generation via Multimodal

Junaid Yousaf; Mazhar Iqbal; Iqra Pervaiz; Muhammad Ismail; Toqeer Ul Islam; Khurram Khan Jadoon

doi:10.62762/TIS.2025.405393

CiteScore

Impact Factor

Volume 2, Issue 3, ICCK Transactions on Intelligent Systematics

Volume 2, Issue 3, 2025

Submit Manuscript Edit a Special Issue

Academic Editor

Xuebo Jin

Beijing Technology and Business University, China

Article QR Code

Scan the QR code for reading

Popular articles

Research on A Ship Trajectory Classification Method Based on Deep Learning Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval A Mimic Fusion Algorithm for Dual Channel Video Based on Possibility Distribution Synthesis Theory YOLOv7-Bw: A Dense Small Object Efficient Detector Based on Remote Sensing Image Deep Prediction Network Based on Covariance Intersection Fusion for Sensor Data Visual Feature Extraction and Tracking Method Based on Corner Flow Detection Inaugural Editorial of the Chinese Journal of Information Fusion YOLOv8-Lite: A Lightweight Object Detection Model for Real-time Autonomous Driving Systems Short and Long-Term Renewable Electricity Demand Forecasting Based on CNN-Bi-GRU Model Simultaneous Spatiotemporal Bias Compensation and Data Fusion for Asynchronous Multisensor Systems

ICCK Transactions on Intelligent Systematics, Volume 2, Issue 3, 2025: 160-168

Research Article | 27 July 2025

Capturing Poetic Essence: Text Summarization and Visual Generation via Multimodal

Junaid Yousaf 1 *

Mazhar Iqbal 3

Iqra Pervaiz 2 *

Muhammad Ismail 4

Toqeer Ul Islam 5

Khurram Khan Jadoon 1

1 Faculty of Computer Science and Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Topi 23460, Pakistan

2 Faculty of Computer Science, CECOS University of Information Technology and Emerging Sciences, Peshawar 25000, Pakistan

3 Graduate School of Information Science and Technology, Osaka University, Osaka 565-0871, Japan

4 School of Information Technology, Deakin University, Geelong, Victoria 3220, Australia

5 School of Computing and Digital Technology, Birmingham City University, West Midlands B5 5JU, United Kingdom

* Corresponding Authors: Junaid Yousaf, [email protected] ; Iqra Pervaiz, [email protected]

DOI: 10.62762/TIS.2025.405393

Received: 03 May 2025, Accepted: 05 June 2025, Published: 27 July 2025

Article Metrics Cite This Article

Abstract

Poetry, as a profound and creative form of human expression, presents unique challenges in interpretation and summarization due to its reliance on figurative language, symbolism, and deeper meanings. Building upon the PoemSum dataset, which introduced the task of poem summarization, we extend its scope by exploring multimodal applications. Specifically, we implement and fine-tune two state-of-the-art abstractive summarization models—BART and T5—to generate concise and meaningful interpretations of poems, focusing on figurative summarization that captures metaphorical and symbolic elements inherent in poetic language. These summaries are then transformed into visual representations using two diffusion-based generative models: Stable Diffusion for high-quality image generation. Our approach evaluates the effectiveness of abstractive summarization models in capturing the essence of poetry and demonstrates how diffusion models can translate abstract poetic themes into visually compelling images. Evaluation results show that the BART model outperforms T5 in summarization, achieving a high ROUGE score of 41.90 and a BERTScore of 85.22. For image generation, the Inception Score (IS) of 7.63 ± 0.62 reflects high visual quality and diversity, while the CLIP (Contrastive Language-Image Pre-training) Score of 29.48 indicates strong alignment between textual summaries and generated images.

Graphical Abstract

Capturing Poetic Essence: Text Summarization and Visual Generation via Multimodal

Keywords

poetry

summarization

image

diffusion model

multimodal

Transformer and Bart

Data Availability Statement

Data will be made available on request.

Funding

This work was supported without any funding.

Conflicts of Interest

The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

Mahbub, R., Khan, I., Anuva, S., Shahriar, M. S., Laskar, M. T. R., & Ahmed, S. (2023, December). Unveiling the essence of poetry: Introducing a comprehensive dataset and bench
[Google Scholar]
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems (NeurIPS), 33, 6840–6851.
[CrossRef] [Google Scholar]
Li, B., Qi, X., Lukasiewicz, T., & Torr, P. (2019). Controllable text-to-image generation. Advances in neural information processing systems, 32.
[CrossRef] [Google Scholar]
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2020). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (p. 7871). Association for Computational Linguistics.
[CrossRef] [Google Scholar]
Virmani, M., Pathak, M., Pai, K. S., & Prasad, V. B. (2023, May). Image synthesis from themes captured in poems using latent diffusion models. In 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) (pp. 655-660). IEEE.
[CrossRef] [Google Scholar]
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 1-67.
[Google Scholar]
Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D., Kilian, M., Lorenz, D., ... & Rombach, R. (2023). Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127.
[Google Scholar]
Chin-Yew Lin. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out, 74–81.
[Google Scholar]
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). BERTScore: Evaluating text generation with BERT. arXiv preprint arXiv:1904.09675.
[Google Scholar]
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684-10695).
[CrossRef] [Google Scholar]
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021, July). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PmLR.
[Google Scholar]
Nasfi, R., De Tré, G., & Bronselaer, A. (2025). Improving data cleaning by learning from unstructured textual data. IEEE Access.
[CrossRef] [Google Scholar]
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, June). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) (pp. 4171-4186).
[CrossRef] [Google Scholar]

Cite This Article

APA Style

Yousaf, J., Iqbal, M., Pervaiz, I., Ismail, M., Islam, T. U., & Jadoon, K. K. (2025). Capturing Poetic Essence: Text Summarization and Visual Generation via Multimodal. ICCK Transactions on Intelligent Systematics, 2(3), 160–168. https://doi.org/10.62762/TIS.2025.405393

Article Metrics

Citations:

Google Scholar

Crossref

Scopus

Web of Science

Article Access Statistics:

PDF Downloads: 4

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

ICCK Transactions on Intelligent Systematics

ISSN: 3068-5079 (Online) | ISSN: pending (Print)

Email: [email protected]

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/

Google Scholar

Crossref

Scopus

Web of Science

We use cookies