Optimization and Control of Discrete-Time Production-Inventory Systems Using Reinforcement Learning

Renfang Wang; Yufei Gong; Peng Su; Linmin Hu; Xin Jiang

doi:10.62762/TSSR.2025.621059

CiteScore

Impact Factor

Volume 1, Issue 2, ICCK Transactions on Systems Safety and Reliability

Volume 1, Issue 2, 2025

Submit Manuscript Edit a Special Issue

Article QR Code

Scan the QR code for reading

Popular articles

Case Studies on Integrating Artificial Intelligence in Finance to Transform Decision Making and Risk Management for Enhanced Financial Outcomes Reinforcement Learning for Prompt Optimization in Language Models: A Comprehensive Survey of Methods, Representations, and Evaluation Challenges Research on A Ship Trajectory Classification Method Based on Deep Learning Bridging Modalities: A Survey of Cross-Modal Image-Text Retrieval AI and the Future of Education: Advancing Personalized Learning and Intelligent Tutoring Systems Enhancing Fake News Detection with a Hybrid NLP-Machine Learning Framework Plant Disease Detection Using Deep Learning Techniques Acrylamide in Food: Sources and Prevention Modeling Brain Functional Networks Using Graph Neural Networks: A Review and Clinical Application Analyzing the Translation and Impact of Popular Science Literature in China: A Case Study Approach

ICCK Transactions on Systems Safety and Reliability, Volume 1, Issue 2, 2025: 98-113

Free to Read | Research Article | 11 November 2025

Optimization and Control of Discrete-Time Production-Inventory Systems Using Reinforcement Learning

Renfang Wang 1

Yufei Gong 1

Peng Su 2

Linmin Hu 1 *

Xin Jiang 1

1 School of Science, Yanshan University, Qinhuangdao 066004, China

2 School of Economics and Management, North University of China, Taiyuan 030051, China

* Corresponding Author: Linmin Hu, [email protected]

DOI: 10.62762/TSSR.2025.621059

Received: 22 August 2025, Accepted: 22 September 2025, Published: 11 November 2025

PDF (1.71 MB)

Article Metrics Cite This Article

Abstract

This study introduces a novel approach for enhancing production decision-making by applying Reinforcement Learning to optimize the Economic Manufacturing Quantity (EMQ) model within discrete-time production-inventory systems. By incorporating machine status, inventory levels, and production choices, a Markov Decision Process (MDP) is constructed and combined with the Q-learning algorithm to derive an adaptive control method. This method enables the dynamic adaptation of production decisions, by effectively balancing the normal operation and shutdown for rest states. Numerical simulations show that the suggested Reinforcement Learning model surpasses conventional EMQ models and steady-state probability models in both convergence speed and cost-effectiveness. This study offers a data-driven approach for optimizing production processes in smart manufacturing settings. It also supports the evolution of production-inventory systems from static planning to dynamic intelligent decision-making.

Graphical Abstract

Keywords

reinforcement learning

economic manufacturing quantity

production inventory optimization

Q-Learning

dynamic decision-making

Data Availability Statement

Data will be made available on request.

Funding

This work was supported in part by the Shijiazhuang Science and Technology Project under Grant 241790737A; in part by the Natural Science Foundation of Hebei Province under Grant G2025203034; in part by the Shanxi Provincial Basic Research Program Youth Project under Grant 202403021212004.

Conflicts of Interest

The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

Harris, F. W. (1990). How many parts to make at once. Operations research, 38(6), 947-950.
[CrossRef] [Google Scholar]
Jaber, M. Y., & Bonney, M. (1999). The economic manufacture/order quantity (EMQ/EOQ) and the learning curve: past, present, and future. International journal of production economics, 59(1-3), 93-102.
[CrossRef] [Google Scholar]
Giri, B. C., & Dohi, T. (2006). Discrete-time economic manufacturing quantity model with stochastic machine breakdown and repair. In Reliability Modeling, Analysis And Optimization (pp. 81-106).
[CrossRef] [Google Scholar]
Giri, B. C., Yun, W. Y., & Dohi, T. (2005). Optimal design of unreliable production–inventory systems with variable production rate. European Journal of Operational Research, 162(2), 372-386.
[CrossRef] [Google Scholar]
Chiu, Y. S. P., Liu, S. C., Chiu, C. L., & Chang, H. H. (2011). Mathematical modeling for determining the replenishment policy for EMQ model with rework and multiple shipments. Mathematical and Computer Modelling, 54(9-10), 2165-2174.
[CrossRef] [Google Scholar]
Chiu, K. C., Yeh, C. W., & Fang, C. C. (2010, December). An EMQ model with time-varying demand over the product life cycle. In 2010 IEEE International Conference on Industrial Engineering and Engineering Management (pp. 1683-1687). IEEE.
[CrossRef] [Google Scholar]
Sarkar, B., Mandal, P., & Sarkar, S. (2014). An EMQ model with price and time dependent demand under the effect of reliability and inflation. Applied Mathematics and Computation, 231, 414-421.
[CrossRef] [Google Scholar]
Borrero, J. S., & Akhavan-Tabatabaei, R. (2013). Time and inventory dependent optimal maintenance policies for single machine workstations: An MDP approach. European Journal of Operational Research, 228(3), 545-555.
[CrossRef] [Google Scholar]
Zhang, N., Cai, K. Q., Deng, Y. J., & Zhang, J. (2024). Joint optimization of condition-based maintenance and condition-based production of a single equipment considering random yield and maintenance delay. Reliability Engineering and System Safety, 241, 109694.
[CrossRef] [Google Scholar]
Zhang, N., Tian, S., Xu, J., Deng, Y., & Cai, K. (2023). Optimal production lot-sizing and condition-based maintenance policy considering imperfect manufacturing process and inspection errors. Computers & Industrial Engineering, 177, 108929.
[CrossRef] [Google Scholar]
Han, R., Ma, X., Yang, L., Cao, H., Guo, H., & Lu, H. (2024, July). Integrated optimization model of economic manufacturing quantity and hybrid condition-based maintenance for continuous-production systems. In IET Conference Proceedings CP886 (Vol. 2024, No. 12, pp. 1248-1254). Stevenage, UK: The Institution of Engineering and Technology.
[CrossRef] [Google Scholar]
Tan, B., Karabağ, O., & Khayyati, S. (2023). Production and energy mode control of a production-inventory system. European Journal of Operational Research, 308(3), 1176-1187.
[CrossRef] [Google Scholar]
Pazouki, M., Jaber, M. Y., & Afshari, H. (2025). Linking forward and backward product quality in a manufacturing/remanufacturing inventory system with price-quality-dependent demand and return rates. Computers and Industrial Engineering, 204, 111072.
[CrossRef] [Google Scholar]
Li, J., Hu, L., & Zhou, Y. (2025). Reliability design and inventory optimization for production-inventory systems considering market demand satisfaction ability. International Journal of General Systems, 1-28.
[CrossRef] [Google Scholar]
Wu, G., de Carvalho Servia, M. Á., & Mowbray, M. (2023). Distributional reinforcement learning for inventory management in multi-echelon supply chains. Digital Chemical Engineering, 6, 100073.
[CrossRef] [Google Scholar]
Hubert, S., Meintschel, J., Bleidorn, D., Ortmanns, Y., & Wallrath, R. (2023). Production scheduling using deep reinforcement learning and discrete event simulation. Chemie Ingenieur Technik, 95(7), 1003-1011.
[CrossRef] [Google Scholar]
Zhou, Y., Guo, K., Yu, C., & Zhang, Z. (2024). Optimization of multi-echelon spare parts inventory systems using multi-agent deep reinforcement learning. Applied Mathematical Modelling, 125, 827-844.
[CrossRef] [Google Scholar]
Tian, R., Lu, M., Wang, H., Wang, B., & Tang, Q. (2024). IACPPO: A deep reinforcement learning-based model for warehouse inventory replenishment. Computers & Industrial Engineering, 187, 109829.
[CrossRef] [Google Scholar]

Cite This Article

APA Style

Wang, R., Gong, Y., Su, P., Hu, L., & Jiang, X. (2025). Optimization and Control of Discrete-Time Production-Inventory Systems Using Reinforcement Learning. ICCK Transactions on Systems Safety and Reliability, 1(2), 98–113. https://doi.org/10.62762/TSSR.2025.621059

Article Metrics

Citations:

Google Scholar

Crossref

Scopus

Web of Science

Article Access Statistics:

PDF Downloads: 31

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Institute of Central Computation and Knowledge (ICCK) or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

ICCK Transactions on Systems Safety and Reliability

ISSN: 3069-1087 (Online)

Email: [email protected]

Portico

All published articles are preserved here permanently:
https://www.portico.org/publishers/icck/

Google Scholar

Crossref

Scopus

Web of Science

We use cookies