-
CiteScore
-
Impact Factor
Volume 2, Issue 4, ICCK Transactions on Emerging Topics in Artificial Intelligence
Volume 2, Issue 4, 2025
Submit Manuscript Edit a Special Issue
Article QR Code
Article QR Code
Scan the QR code for reading
Popular articles
ICCK Transactions on Emerging Topics in Artificial Intelligence, Volume 2, Issue 4, 2025: 173-181

Open Access | Review Article | 14 September 2025
Reinforcement Learning for Prompt Optimization in Language Models: A Comprehensive Survey of Methods, Representations, and Evaluation Challenges
1 Brown University, Providence, RI 02912, United States
* Corresponding Author: Zhangqi Liu, [email protected]
Received: 01 August 2025, Accepted: 23 August 2025, Published: 14 September 2025  
Abstract
The growing prominence of prompt engineering as a means of controlling large language models has given rise to a diverse set of methods, ranging from handcrafted templates to embedding-level tuning. Yet, as prompts increasingly serve not merely as input scaffolds but as adaptive interfaces between users and models, the question of how to systematically optimize them remains unresolved. Reinforcement learning, with its capacity for sequential decision-making and reward-driven adaptation, has been proposed as a possible framework for discovering effective prompting strategies. This survey explores the emerging intersection of RL and prompt engineering, organizing existing research along three interdependent axes: the representation of prompts (symbolic, soft, and hybrid), the design of RL-based optimization mechanisms, and the challenges of evaluating and generalizing learned prompt policies. Rather than presenting a single unified framework, the discussion reflects the fragmented, often experimental nature of current approaches, many of which remain constrained by unstable reward signals, limited generalizability, and a lack of reproducible evaluation standards. By analyzing methodological innovations and points of friction alike, this work aims to foster a more critical and reflective understanding of what it means to "learn to prompt" in complex, real-world language modeling contexts.

Graphical Abstract
Reinforcement Learning for Prompt Optimization in Language Models: A Comprehensive Survey of Methods, Representations, and Evaluation Challenges

Keywords
prompt engineering
reinforcement learning
language models
prompt optimization
reward design
prompt representation

Data Availability Statement
Not applicable.

Funding
This work was supported without any funding.

Conflicts of Interest
The author declares no conflicts of interest.

Ethical Approval and Consent to Participate
Not applicable.

References
  1. Zheng, H., Shen, L., Tang, A., Luo, Y., Hu, H., Du, B., ... & Tao, D. (2025). Learning from models beyond fine-tuning. Nature Machine Intelligence, 7(1), 6-17.
    [CrossRef]   [Google Scholar]
  2. Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2023). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM computing surveys, 55(9), 1-35.
    [CrossRef]   [Google Scholar]
  3. Banerjee, C., & Nazir, M. S. Zero-Shot Llms in Human-in-The-Loop Rl: Replacing Human Feedback for Reward Shaping. Available at SSRN 5218722. https://dx.doi.org/10.2139/ssrn.5218722
    [Google Scholar]
  4. Ling, C., Zhao, X., Lu, J., Deng, C., Zheng, C., Wang, J., ... & Zhao, L. (2023). Domain specialization as the key to make large language models disruptive: A comprehensive survey. ACM Computing Surveys.
    [CrossRef]   [Google Scholar]
  5. Xin, X., Pimentel, T., Karatzoglou, A., Ren, P., Christakopoulou, K., & Ren, Z. (2022, July). Rethinking reinforcement learning for recommendation: A prompt perspective. In Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval (pp. 1347-1357).
    [CrossRef]   [Google Scholar]
  6. Lester, B., Al-Rfou, R., & Constant, N. (2021, November). The Power of Scale for Parameter-Efficient Prompt Tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 3045-3059).
    [CrossRef]   [Google Scholar]
  7. Lu, D., Wu, S., & Huang, X. (2025, March). Research on personalized medical intervention strategy generation system based on group relative policy optimization and time-series data fusion. In Proceedings of the 2025 International Conference on Health Big Data (pp. 86-91).
    [CrossRef]   [Google Scholar]
  8. Wu, S., Huang, X., & Lu, D. (2025, March). Psychological health knowledge-enhanced LLM-based social network crisis intervention text transfer recognition method. In Proceedings of the 2025 International Conference on Health Big Data (pp. 156-161).
    [CrossRef]   [Google Scholar]
  9. Shih, K., Deng, Z., Chen, X., Zhang, Y., & Zhang, L. (2025, May). DST-GFN: A Dual-Stage Transformer Network with Gated Fusion for Pairwise User Preference Prediction in Dialogue Systems. In 2025 8th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE) (pp. 715-719). IEEE.
    [CrossRef]   [Google Scholar]
  10. Xing, Y., & Liu, P. (2023, June). Prompt and instruction-based tuning for response generation in conversational question answering. In International conference on applications of natural language to information systems (pp. 156-169). Cham: Springer Nature Switzerland.
    [CrossRef]   [Google Scholar]
  11. Meynhardt, C., Meybohm, P., Kranke, P., & Hölzing, C. R. (2025). Advanced Prompt Engineering in Emergency Medicine and Anesthesia: Enhancing Simulation-Based e-Learning. Electronics, 14(5), 1028. 10.3390/electronics14051028
    [Google Scholar]
  12. Feng, H., Dai, Y., & Gao, Y. (2025). Personalized Risks and Regulatory Strategies of Large Language Models in Digital Advertising. arXiv preprint arXiv:2505.04665.
    [CrossRef]   [Google Scholar]
  13. Huang, T., Yi, J., Yu, P., & Xu, X. (2025, March). Unmasking digital falsehoods: A comparative analysis of LLM-based misinformation detection strategies. In 2025 8th International Conference on Advanced Algorithms and Control Engineering (ICAACE) (pp. 2470-2476). IEEE.
    [CrossRef]   [Google Scholar]
  14. Xu, M., Shen, Y., Zhang, S., Lu, Y., Zhao, D., Tenenbaum, J., & Gan, C. (2022, June). Prompting decision transformer for few-shot policy generalization. In international conference on machine learning (pp. 24631-24645). PMLR.
    [Google Scholar]
  15. Shin, T., Razeghi, Y., Logan IV, R. L., Wallace, E., & Singh, S. (2020, November). AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 4222-4235).
    [CrossRef]   [Google Scholar]
  16. Do, X. L., Dinh, D., Nguyen, N. H., Kawaguchi, K., Chen, N., Joty, S., & Kan, M. Y. (2025, July). What Makes a Good Natural Language Prompt?. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 5835-5873).
    [CrossRef]   [Google Scholar]
  17. Yi, Q., He, Y., Wang, J., Song, X., Qian, S., Yuan, X., ... & Shi, T. (2025). Score: Story coherence and retrieval enhancement for ai narratives. arXiv preprint arXiv:2503.23512.
    [Google Scholar]
  18. Saletta, M., & Ferretti, C. (2024, July). Exploring the prompt space of large language models through evolutionary sampling. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 1345-1353).
    [CrossRef]   [Google Scholar]
  19. Liu, S., Fang, Y., Cheng, H., Pan, Y., Liu, Y., & Gao, C. (2023, November). Large Language Models guided Generative Prompt for Dialogue Generation. In 2023 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC) (pp. 10-17). IEEE.
    [CrossRef]   [Google Scholar]
  20. Tao, Y., Wang, Z., Zhang, H., Wang, L., & Gu, J. (2025, July). Nevlp: Noise-robust framework for efficient vision-language pre-training. In International Conference on Intelligent Computing (pp. 74-85). Singapore: Springer Nature Singapore.
    [CrossRef]   [Google Scholar]
  21. Nadizar, G., Rovito, L., De Lorenzo, A., Medvet, E., & Virgolin, M. (2024). An analysis of the ingredients for learning interpretable symbolic regression models with human-in-the-loop and genetic programming. ACM Transactions on Evolutionary Learning and Optimization, 4(1), 1-30.
    [CrossRef]   [Google Scholar]
  22. Jain, A. M., & Jindal, M. (2025, March). Systematic survey of various prompt optimization methods and their classifications. In 2025 11th International Conference on Computing and Artificial Intelligence (ICCAI) (pp. 524-536). IEEE.
    [CrossRef]   [Google Scholar]
  23. Huang, T., Xu, Z., Yu, P., Yi, J., & Xu, X. (2025). A hybrid transformer model for fake news detection: Leveraging Bayesian optimization and bidirectional recurrent unit. arXiv preprint arXiv:2502.09097.
    [Google Scholar]
  24. Shorinwa, O., Mei, Z., Lidard, J., Ren, A. Z., & Majumdar, A. (2025). A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, and future directions. ACM Computing Surveys.
    [CrossRef]   [Google Scholar]
  25. He, Y., Wang, J., Wang, Y., Li, K., Zhong, Y., Song, X., ... & Chen, J. (2025). Enhancing intent understanding for ambiguous prompt: A human-machine co-adaption strategy. arXiv preprint arXiv:2501.15167.
    [Google Scholar]
  26. Milani, S., Topin, N., Veloso, M., & Fang, F. (2024). Explainable reinforcement learning: A survey and comparative review. ACM Computing Surveys, 56(7), 1-36.
    [CrossRef]   [Google Scholar]
  27. Huang, T., Cui, Z., Du, C., & Chiang, C. E. (2025). CL-ISR: A Contrastive Learning and Implicit Stance Reasoning Framework for Misleading Text Detection on Social Media. arXiv preprint arXiv:2506.05107.
    [Google Scholar]
  28. Ciniselli, M., Cooper, N., Pascarella, L., Mastropaolo, A., Aghajani, E., Poshyvanyk, D., ... & Bavota, G. (2021). An empirical study on the usage of transformer models for code completion. IEEE Transactions on Software Engineering, 48(12), 4818-4837.
    [CrossRef]   [Google Scholar]
  29. Sheilsspeigh, P., Larkspur, M., Carver, S., & Longmore, S. (2024). Dynamic context shaping: A new approach to adaptive representation learning in large language models.
    [CrossRef]   [Google Scholar]
  30. Samuel, J., Khanna, T., Esguerra, J., Sundar, S., Pelaez, A., & Bhuyan, S. S. (2025). The Rise of Artificial Intelligence Phobia! Unveiling News-Driven Spread of AI Fear Sentiment using ML, NLP and LLMs. IEEE Access.
    [CrossRef]   [Google Scholar]
  31. Deng, Z., Ma, W., Han, Q. L., Zhou, W., Zhu, X., Wen, S., & Xiang, Y. (2025). Exploring DeepSeek: A Survey on Advances, Applications, Challenges and Future Directions. IEEE/CAA Journal of Automatica Sinica, 12(5), 872-893.
    [CrossRef]   [Google Scholar]

Cite This Article
APA Style
Liu, Z. (2025). Reinforcement Learning for Prompt Optimization in Language Models: A Comprehensive Survey of Methods, Representations, and Evaluation Challenges. ICCK Transactions on Emerging Topics in Artificial Intelligence, 2(4), 173–181. https://doi.org/10.62762/TETAI.2025.790504

Article Metrics
Citations:

Crossref

0

Scopus

0

Web of Science

0
Article Access Statistics:
Views: 55
PDF Downloads: 18

Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions
CC BY Copyright © 2025 by the Author(s). Published by Institute of Central Computation and Knowledge. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
ICCK Transactions on Emerging Topics in Artificial Intelligence

ICCK Transactions on Emerging Topics in Artificial Intelligence

ISSN: 3068-6652 (Online)

Email: [email protected]

Portico

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/