Optimization and Control of Discrete-Time Production-Inventory Systems Using Reinforcement Learning
Article Information
Abstract
This study introduces a novel approach for enhancing production decision-making by applying Reinforcement Learning to optimize the Economic Manufacturing Quantity (EMQ) model within discrete-time production-inventory systems. By incorporating machine status, inventory levels, and production choices, a Markov Decision Process (MDP) is constructed and combined with the Q-learning algorithm to derive an adaptive control method. This method enables the dynamic adaptation of production decisions, by effectively balancing the normal operation and shutdown for rest states. Numerical simulations show that the suggested Reinforcement Learning model surpasses conventional EMQ models and steady-state probability models in both convergence speed and cost-effectiveness. This study offers a data-driven approach for optimizing production processes in smart manufacturing settings. It also supports the evolution of production-inventory systems from static planning to dynamic intelligent decision-making.
Graphical Abstract
Keywords
Data Availability Statement
Funding
Conflicts of Interest
Ethical Approval and Consent to Participate
References
- Harris, F. W. (1990). How many parts to make at once. Operations research, 38(6), 947-950.
[CrossRef] [Google Scholar] - Jaber, M. Y., & Bonney, M. (1999). The economic manufacture/order quantity (EMQ/EOQ) and the learning curve: past, present, and future. International journal of production economics, 59(1-3), 93-102.
[CrossRef] [Google Scholar] - Giri, B. C., & Dohi, T. (2006). Discrete-time economic manufacturing quantity model with stochastic machine breakdown and repair. In Reliability Modeling, Analysis And Optimization (pp. 81-106).
[CrossRef] [Google Scholar] - Giri, B. C., Yun, W. Y., & Dohi, T. (2005). Optimal design of unreliable production–inventory systems with variable production rate. European Journal of Operational Research, 162(2), 372-386.
[CrossRef] [Google Scholar] - Chiu, Y. S. P., Liu, S. C., Chiu, C. L., & Chang, H. H. (2011). Mathematical modeling for determining the replenishment policy for EMQ model with rework and multiple shipments. Mathematical and Computer Modelling, 54(9-10), 2165-2174.
[CrossRef] [Google Scholar] - Chiu, K. C., Yeh, C. W., & Fang, C. C. (2010, December). An EMQ model with time-varying demand over the product life cycle. In 2010 IEEE International Conference on Industrial Engineering and Engineering Management (pp. 1683-1687). IEEE.
[CrossRef] [Google Scholar] - Sarkar, B., Mandal, P., & Sarkar, S. (2014). An EMQ model with price and time dependent demand under the effect of reliability and inflation. Applied Mathematics and Computation, 231, 414-421.
[CrossRef] [Google Scholar] - Borrero, J. S., & Akhavan-Tabatabaei, R. (2013). Time and inventory dependent optimal maintenance policies for single machine workstations: An MDP approach. European Journal of Operational Research, 228(3), 545-555.
[CrossRef] [Google Scholar] - Zhang, N., Cai, K. Q., Deng, Y. J., & Zhang, J. (2024). Joint optimization of condition-based maintenance and condition-based production of a single equipment considering random yield and maintenance delay. Reliability Engineering and System Safety, 241, 109694.
[CrossRef] [Google Scholar] - Zhang, N., Tian, S., Xu, J., Deng, Y., & Cai, K. (2023). Optimal production lot-sizing and condition-based maintenance policy considering imperfect manufacturing process and inspection errors. Computers & Industrial Engineering, 177, 108929.
[CrossRef] [Google Scholar] - Han, R., Ma, X., Yang, L., Cao, H., Guo, H., & Lu, H. (2024, July). Integrated optimization model of economic manufacturing quantity and hybrid condition-based maintenance for continuous-production systems. In IET Conference Proceedings CP886 (Vol. 2024, No. 12, pp. 1248-1254). Stevenage, UK: The Institution of Engineering and Technology.
[CrossRef] [Google Scholar] - Tan, B., Karabağ, O., & Khayyati, S. (2023). Production and energy mode control of a production-inventory system. European Journal of Operational Research, 308(3), 1176-1187.
[CrossRef] [Google Scholar] - Pazouki, M., Jaber, M. Y., & Afshari, H. (2025). Linking forward and backward product quality in a manufacturing/remanufacturing inventory system with price-quality-dependent demand and return rates. Computers and Industrial Engineering, 204, 111072.
[CrossRef] [Google Scholar] - Li, J., Hu, L., & Zhou, Y. (2025). Reliability design and inventory optimization for production-inventory systems considering market demand satisfaction ability. International Journal of General Systems, 1-28.
[CrossRef] [Google Scholar] - Wu, G., de Carvalho Servia, M. Á., & Mowbray, M. (2023). Distributional reinforcement learning for inventory management in multi-echelon supply chains. Digital Chemical Engineering, 6, 100073.
[CrossRef] [Google Scholar] - Hubert, S., Meintschel, J., Bleidorn, D., Ortmanns, Y., & Wallrath, R. (2023). Production scheduling using deep reinforcement learning and discrete event simulation. Chemie Ingenieur Technik, 95(7), 1003-1011.
[CrossRef] [Google Scholar] - Zhou, Y., Guo, K., Yu, C., & Zhang, Z. (2024). Optimization of multi-echelon spare parts inventory systems using multi-agent deep reinforcement learning. Applied Mathematical Modelling, 125, 827-844.
[CrossRef] [Google Scholar] - Tian, R., Lu, M., Wang, H., Wang, B., & Tang, Q. (2024). IACPPO: A deep reinforcement learning-based model for warehouse inventory replenishment. Computers & Industrial Engineering, 187, 109829.
[CrossRef] [Google Scholar]
Cited By (1)
-
Longfei Wang, Keyu Wang, Libin Tan, Zhaojun Li. Production Planning and Supply Reliability Analysis of Core Components of Wind Turbines.
International Journal of Reliability, Quality and Safety Engineering, 2026 , 33 (04).
[CrossRef]
Cite This Article
TY - JOUR AU - Wang, Renfang AU - Gong, Yufei AU - Su, Peng AU - Hu, Linmin AU - Jiang, Xin PY - 2025 DA - 2025/11/11 TI - Optimization and Control of Discrete-Time Production-Inventory Systems Using Reinforcement Learning JO - ICCK Transactions on Systems Safety and Reliability T2 - ICCK Transactions on Systems Safety and Reliability JF - ICCK Transactions on Systems Safety and Reliability VL - 1 IS - 2 SP - 98 EP - 113 DO - 10.62762/TSSR.2025.621059 UR - https://www.icck.org/article/abs/TSSR.2025.621059 KW - reinforcement learning KW - economic manufacturing quantity KW - production inventory optimization KW - Q-Learning KW - dynamic decision-making AB - This study introduces a novel approach for enhancing production decision-making by applying Reinforcement Learning to optimize the Economic Manufacturing Quantity (EMQ) model within discrete-time production-inventory systems. By incorporating machine status, inventory levels, and production choices, a Markov Decision Process (MDP) is constructed and combined with the Q-learning algorithm to derive an adaptive control method. This method enables the dynamic adaptation of production decisions, by effectively balancing the normal operation and shutdown for rest states. Numerical simulations show that the suggested Reinforcement Learning model surpasses conventional EMQ models and steady-state probability models in both convergence speed and cost-effectiveness. This study offers a data-driven approach for optimizing production processes in smart manufacturing settings. It also supports the evolution of production-inventory systems from static planning to dynamic intelligent decision-making. SN - 3069-1087 PB - Institute of Central Computation and Knowledge LA - English ER -
@article{Wang2025Optimizati,
author = {Renfang Wang and Yufei Gong and Peng Su and Linmin Hu and Xin Jiang},
title = {Optimization and Control of Discrete-Time Production-Inventory Systems Using Reinforcement Learning},
journal = {ICCK Transactions on Systems Safety and Reliability},
year = {2025},
volume = {1},
number = {2},
pages = {98-113},
doi = {10.62762/TSSR.2025.621059},
url = {https://www.icck.org/article/abs/TSSR.2025.621059},
abstract = {This study introduces a novel approach for enhancing production decision-making by applying Reinforcement Learning to optimize the Economic Manufacturing Quantity (EMQ) model within discrete-time production-inventory systems. By incorporating machine status, inventory levels, and production choices, a Markov Decision Process (MDP) is constructed and combined with the Q-learning algorithm to derive an adaptive control method. This method enables the dynamic adaptation of production decisions, by effectively balancing the normal operation and shutdown for rest states. Numerical simulations show that the suggested Reinforcement Learning model surpasses conventional EMQ models and steady-state probability models in both convergence speed and cost-effectiveness. This study offers a data-driven approach for optimizing production processes in smart manufacturing settings. It also supports the evolution of production-inventory systems from static planning to dynamic intelligent decision-making.},
keywords = {reinforcement learning, economic manufacturing quantity, production inventory optimization, Q-Learning, dynamic decision-making},
issn = {3069-1087},
publisher = {Institute of Central Computation and Knowledge}
}
Publisher's Note
ICCK stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and Permissions
Portico