-
CiteScore
-
Impact Factor
Volume 2, Issue 3, ICCK Transactions on Intelligent Systematics
Volume 2, Issue 3, 2025
Submit Manuscript Edit a Special Issue
Academic Editor
Xuebo Jin
Xuebo Jin
Beijing Technology and Business University, China
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Transactions on Intelligent Systematics, Volume 2, Issue 3, 2025: 160-168

Research Article | 27 July 2025
Capturing Poetic Essence: Text Summarization and Visual Generation via Multimodal
1 Faculty of Computer Science and Engineering, Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Topi 23460, Pakistan
2 Faculty of Computer Science, CECOS University of Information Technology and Emerging Sciences, Peshawar 25000, Pakistan
3 Graduate School of Information Science and Technology, Osaka University, Osaka 565-0871, Japan
4 School of Information Technology, Deakin University, Geelong, Victoria 3220, Australia
5 School of Computing and Digital Technology, Birmingham City University, West Midlands B5 5JU, United Kingdom
* Corresponding Authors: Junaid Yousaf, [email protected] ; Iqra Pervaiz, [email protected]
Received: 03 May 2025, Accepted: 05 June 2025, Published: 27 July 2025  
Abstract
Poetry, as a profound and creative form of human expression, presents unique challenges in interpretation and summarization due to its reliance on figurative language, symbolism, and deeper meanings. Building upon the PoemSum dataset, which introduced the task of poem summarization, we extend its scope by exploring multimodal applications. Specifically, we implement and fine-tune two state-of-the-art abstractive summarization models—BART and T5—to generate concise and meaningful interpretations of poems, focusing on figurative summarization that captures metaphorical and symbolic elements inherent in poetic language. These summaries are then transformed into visual representations using two diffusion-based generative models: Stable Diffusion for high-quality image generation. Our approach evaluates the effectiveness of abstractive summarization models in capturing the essence of poetry and demonstrates how diffusion models can translate abstract poetic themes into visually compelling images. Evaluation results show that the BART model outperforms T5 in summarization, achieving a high ROUGE score of 41.90 and a BERTScore of 85.22. For image generation, the Inception Score (IS) of 7.63 ± 0.62 reflects high visual quality and diversity, while the CLIP (Contrastive Language-Image Pre-training) Score of 29.48 indicates strong alignment between textual summaries and generated images.

Graphical Abstract
Capturing Poetic Essence: Text Summarization and Visual Generation via Multimodal

Keywords
poetry
summarization
image
diffusion model
multimodal
Transformer and Bart

Data Availability Statement
Data will be made available on request.

Funding
This work was supported without any funding.

Conflicts of Interest
The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Mahbub, R., Khan, I., Anuva, S., Shahriar, M. S., Laskar, M. T. R., & Ahmed, S. (2023, December). Unveiling the essence of poetry: Introducing a comprehensive dataset and bench
    [Google Scholar]
  2. Ho, J., Jain, A., & Abbeel, P. (2020). Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems (NeurIPS), 33, 6840–6851.
    [CrossRef]   [Google Scholar]
  3. Li, B., Qi, X., Lukasiewicz, T., & Torr, P. (2019). Controllable text-to-image generation. Advances in neural information processing systems, 32.
    [CrossRef]   [Google Scholar]
  4. Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2020). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (p. 7871). Association for Computational Linguistics.
    [CrossRef]   [Google Scholar]
  5. Virmani, M., Pathak, M., Pai, K. S., & Prasad, V. B. (2023, May). Image synthesis from themes captured in poems using latent diffusion models. In 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) (pp. 655-660). IEEE.
    [CrossRef]   [Google Scholar]
  6. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140), 1-67.
    [Google Scholar]
  7. Blattmann, A., Dockhorn, T., Kulal, S., Mendelevitch, D., Kilian, M., Lorenz, D., ... & Rombach, R. (2023). Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv preprint arXiv:2311.15127.
    [Google Scholar]
  8. Chin-Yew Lin. (2004). ROUGE: A Package for Automatic Evaluation of Summaries. Text Summarization Branches Out, 74–81.
    [Google Scholar]
  9. Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). BERTScore: Evaluating text generation with BERT. arXiv preprint arXiv:1904.09675.
    [Google Scholar]
  10. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 10684-10695).
    [CrossRef]   [Google Scholar]
  11. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021, July). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PmLR.
    [Google Scholar]
  12. Nasfi, R., De Tré, G., & Bronselaer, A. (2025). Improving data cleaning by learning from unstructured textual data. IEEE Access.
    [CrossRef]   [Google Scholar]
  13. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, June). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) (pp. 4171-4186).
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Yousaf, J., Iqbal, M., Pervaiz, I., Ismail, M., Islam, T. U., & Jadoon, K. K. (2025). Capturing Poetic Essence: Text Summarization and Visual Generation via Multimodal. ICCK Transactions on Intelligent Systematics, 2(3), 160–168. https://doi.org/10.62762/TIS.2025.405393

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 34
PDF Downloads: 4

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
ICCK Transactions on Intelligent Systematics

ICCK Transactions on Intelligent Systematics

ISSN: 3068-5079 (Online) | ISSN: pending (Print)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/