Optimization and Control of Discrete-Time Production-Inventory Systems Using Reinforcement Learning

Renfang Wang; Yufei Gong; Peng Su; Linmin Hu; Xin Jiang

doi:10.62762/TSSR.2025.621059

Article Information

Published in ICCK Transactions on Systems Safety and Reliability

Volume/Issue Volume 1, Issue 2, 2025

Pages 98-113

Cited by 1 (Crossref) 1 (Scopus)

Abstract

This study introduces a novel approach for enhancing production decision-making by applying Reinforcement Learning to optimize the Economic Manufacturing Quantity (EMQ) model within discrete-time production-inventory systems. By incorporating machine status, inventory levels, and production choices, a Markov Decision Process (MDP) is constructed and combined with the Q-learning algorithm to derive an adaptive control method. This method enables the dynamic adaptation of production decisions, by effectively balancing the normal operation and shutdown for rest states. Numerical simulations show that the suggested Reinforcement Learning model surpasses conventional EMQ models and steady-state probability models in both convergence speed and cost-effectiveness. This study offers a data-driven approach for optimizing production processes in smart manufacturing settings. It also supports the evolution of production-inventory systems from static planning to dynamic intelligent decision-making.

Graphical Abstract

Optimization and Control of Discrete-Time Production-Inventory Systems Using Reinforcement Learning

Keywords

reinforcement learning economic manufacturing quantity production inventory optimization Q-Learning dynamic decision-making

Data Availability Statement

Data will be made available on request.

Funding

This work was supported in part by the Shijiazhuang Science and Technology Project under Grant 241790737A; in part by the Natural Science Foundation of Hebei Province under Grant G2025203034; in part by the Shanxi Provincial Basic Research Program Youth Project under Grant 202403021212004.

Conflicts of Interest

The authors declare no conflicts of interest.

Ethical Approval and Consent to Participate

Not applicable.

References

Harris, F. W. (1990). How many parts to make at once. Operations research, 38(6), 947-950.
[CrossRef] [Google Scholar]
Jaber, M. Y., & Bonney, M. (1999). The economic manufacture/order quantity (EMQ/EOQ) and the learning curve: past, present, and future. International journal of production economics, 59(1-3), 93-102.
[CrossRef] [Google Scholar]
Giri, B. C., & Dohi, T. (2006). Discrete-time economic manufacturing quantity model with stochastic machine breakdown and repair. In Reliability Modeling, Analysis And Optimization (pp. 81-106).
[CrossRef] [Google Scholar]
Giri, B. C., Yun, W. Y., & Dohi, T. (2005). Optimal design of unreliable production–inventory systems with variable production rate. European Journal of Operational Research, 162(2), 372-386.
[CrossRef] [Google Scholar]
Chiu, Y. S. P., Liu, S. C., Chiu, C. L., & Chang, H. H. (2011). Mathematical modeling for determining the replenishment policy for EMQ model with rework and multiple shipments. Mathematical and Computer Modelling, 54(9-10), 2165-2174.
[CrossRef] [Google Scholar]
Chiu, K. C., Yeh, C. W., & Fang, C. C. (2010, December). An EMQ model with time-varying demand over the product life cycle. In 2010 IEEE International Conference on Industrial Engineering and Engineering Management (pp. 1683-1687). IEEE.
[CrossRef] [Google Scholar]
Sarkar, B., Mandal, P., & Sarkar, S. (2014). An EMQ model with price and time dependent demand under the effect of reliability and inflation. Applied Mathematics and Computation, 231, 414-421.
[CrossRef] [Google Scholar]
Borrero, J. S., & Akhavan-Tabatabaei, R. (2013). Time and inventory dependent optimal maintenance policies for single machine workstations: An MDP approach. European Journal of Operational Research, 228(3), 545-555.
[CrossRef] [Google Scholar]
Zhang, N., Cai, K. Q., Deng, Y. J., & Zhang, J. (2024). Joint optimization of condition-based maintenance and condition-based production of a single equipment considering random yield and maintenance delay. Reliability Engineering and System Safety, 241, 109694.
[CrossRef] [Google Scholar]
Zhang, N., Tian, S., Xu, J., Deng, Y., & Cai, K. (2023). Optimal production lot-sizing and condition-based maintenance policy considering imperfect manufacturing process and inspection errors. Computers & Industrial Engineering, 177, 108929.
[CrossRef] [Google Scholar]
Han, R., Ma, X., Yang, L., Cao, H., Guo, H., & Lu, H. (2024, July). Integrated optimization model of economic manufacturing quantity and hybrid condition-based maintenance for continuous-production systems. In IET Conference Proceedings CP886 (Vol. 2024, No. 12, pp. 1248-1254). Stevenage, UK: The Institution of Engineering and Technology.
[CrossRef] [Google Scholar]
Tan, B., Karabağ, O., & Khayyati, S. (2023). Production and energy mode control of a production-inventory system. European Journal of Operational Research, 308(3), 1176-1187.
[CrossRef] [Google Scholar]
Pazouki, M., Jaber, M. Y., & Afshari, H. (2025). Linking forward and backward product quality in a manufacturing/remanufacturing inventory system with price-quality-dependent demand and return rates. Computers and Industrial Engineering, 204, 111072.
[CrossRef] [Google Scholar]
Li, J., Hu, L., & Zhou, Y. (2025). Reliability design and inventory optimization for production-inventory systems considering market demand satisfaction ability. International Journal of General Systems, 1-28.
[CrossRef] [Google Scholar]
Wu, G., de Carvalho Servia, M. Á., & Mowbray, M. (2023). Distributional reinforcement learning for inventory management in multi-echelon supply chains. Digital Chemical Engineering, 6, 100073.
[CrossRef] [Google Scholar]
Hubert, S., Meintschel, J., Bleidorn, D., Ortmanns, Y., & Wallrath, R. (2023). Production scheduling using deep reinforcement learning and discrete event simulation. Chemie Ingenieur Technik, 95(7), 1003-1011.
[CrossRef] [Google Scholar]
Zhou, Y., Guo, K., Yu, C., & Zhang, Z. (2024). Optimization of multi-echelon spare parts inventory systems using multi-agent deep reinforcement learning. Applied Mathematical Modelling, 125, 827-844.
[CrossRef] [Google Scholar]
Tian, R., Lu, M., Wang, H., Wang, B., & Tang, Q. (2024). IACPPO: A deep reinforcement learning-based model for warehouse inventory replenishment. Computers & Industrial Engineering, 187, 109829.
[CrossRef] [Google Scholar]

Cited By (1)

Longfei Wang, Keyu Wang, Libin Tan, Zhaojun Li. Production Planning and Supply Reliability Analysis of Core Components of Wind Turbines. International Journal of Reliability, Quality and Safety Engineering, 2026 , 33 (04).
[CrossRef]

* Citation data provided by Crossref Cited-by.

Cite This Article

APA Style

Wang, R., Gong, Y., Su, P., Hu, L., & Jiang, X. (2025). Optimization and Control of Discrete-Time Production-Inventory Systems Using Reinforcement Learning. ICCK Transactions on Systems Safety and Reliability, 1(2), 98–113. https://doi.org/10.62762/TSSR.2025.621059

Export Citation

RIS Format

Compatible with EndNote, Zotero, Mendeley, and other reference managers

TY  - JOUR
AU  - Wang, Renfang
AU  - Gong, Yufei
AU  - Su, Peng
AU  - Hu, Linmin
AU  - Jiang, Xin
PY  - 2025
DA  - 2025/11/11
TI  - Optimization and Control of Discrete-Time Production-Inventory Systems Using Reinforcement Learning
JO  - ICCK Transactions on Systems Safety and Reliability
T2  - ICCK Transactions on Systems Safety and Reliability
JF  - ICCK Transactions on Systems Safety and Reliability
VL  - 1
IS  - 2
SP  - 98
EP  - 113
DO  - 10.62762/TSSR.2025.621059
UR  - https://www.icck.org/article/abs/TSSR.2025.621059
KW  - reinforcement learning
KW  - economic manufacturing quantity
KW  - production inventory optimization
KW  - Q-Learning
KW  - dynamic decision-making
AB  - This study introduces a novel approach for enhancing production decision-making by applying Reinforcement Learning to optimize the Economic Manufacturing Quantity (EMQ) model within discrete-time production-inventory systems. By incorporating machine status, inventory levels, and production choices, a Markov Decision Process (MDP) is constructed and combined with the Q-learning algorithm to derive an adaptive control method. This method enables the dynamic adaptation of production decisions, by effectively balancing the normal operation and shutdown for rest states. Numerical simulations show that the suggested Reinforcement Learning model surpasses conventional EMQ models and steady-state probability models in both convergence speed and cost-effectiveness. This study offers a data-driven approach for optimizing production processes in smart manufacturing settings. It also supports the evolution of production-inventory systems from static planning to dynamic intelligent decision-making.
SN  - 3069-1087
PB  - Institute of Central Computation and Knowledge
LA  - English
ER  -

BibTeX Format

Compatible with LaTeX, BibTeX, and other reference managers

@article{Wang2025Optimizati,
  author = {Renfang Wang and Yufei Gong and Peng Su and Linmin Hu and Xin Jiang},
  title = {Optimization and Control of Discrete-Time Production-Inventory Systems Using Reinforcement Learning},
  journal = {ICCK Transactions on Systems Safety and Reliability},
  year = {2025},
  volume = {1},
  number = {2},
  pages = {98-113},
  doi = {10.62762/TSSR.2025.621059},
  url = {https://www.icck.org/article/abs/TSSR.2025.621059},
  abstract = {This study introduces a novel approach for enhancing production decision-making by applying Reinforcement Learning to optimize the Economic Manufacturing Quantity (EMQ) model within discrete-time production-inventory systems. By incorporating machine status, inventory levels, and production choices, a Markov Decision Process (MDP) is constructed and combined with the Q-learning algorithm to derive an adaptive control method. This method enables the dynamic adaptation of production decisions, by effectively balancing the normal operation and shutdown for rest states. Numerical simulations show that the suggested Reinforcement Learning model surpasses conventional EMQ models and steady-state probability models in both convergence speed and cost-effectiveness. This study offers a data-driven approach for optimizing production processes in smart manufacturing settings. It also supports the evolution of production-inventory systems from static planning to dynamic intelligent decision-making.},
  keywords = {reinforcement learning, economic manufacturing quantity, production inventory optimization, Q-Learning, dynamic decision-making},
  issn = {3069-1087},
  publisher = {Institute of Central Computation and Knowledge}
}

Article Metrics

Citations

Crossref

1

Scopus

1

Views

1245

PDF Downloads

427

Publisher's Note

ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and Permissions

Institute of Central Computation and Knowledge (ICCK) or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

ICCK Transactions on Systems Safety and Reliability

ISSN: 3069-1087 (Online)

[email protected]

Preserved at
Portico

User

Unlimited Downloads

Complete Library Access

Membership Eligibility

Community Leadership Opportunities