Research on reinforcement learning decision algorithm for multi-objective dynamic job shop scheduling

doi:10.16731/j.cnki.1671-3133.2025.07.003

Abstract

Abstract: To address the multi-objective dynamic job shop scheduling problem and meet the real-time scheduling needs of manufacturing workshops in environments with variable scales, a method combining Proximal Policy Optimization (PPO) with GoogLeNet, named GLN-PPO, is proposed. This method constructs the state space of the scheduling problem using multidimensional matrices, designs an action space based on various priority rules, and devises a multi-objective reward function. To verify the effectiveness of the proposed algorithm, it is trained and tested in three environments: a static public environment based on common benchmark problems, a static real environment based on actual cases, and a dynamic real environment. Experimental results show that compared to genetic algorithms, GLN-PPO can provide high-quality scheduling results, meet the real-time scheduling requirements of enterprises, and adapt flexibly to environments with variable scales.

Key words: deep reinforcement learning, job shop scheduling, GoogLeNet, Proximal Policy Optimization (PPO)

CLC Number:

TP18
TH164

ZHANG Ningning, WAN Weibing, ZHANG Mengxiao, ZHAO Yuming. Research on reinforcement learning decision algorithm for multi-objective dynamic job shop scheduling[J]. Modern Manufacturing Engineering, 2025, 538(7): 20-30.

References

[1] LI J, PAN Q, LIANG Y C.An effective hybrid tabu search algorithm for multi-objective flexible job-shop scheduling problems[J]. Computers & Industrial Engineering, 2010,59(4):647-662.
[2] 胡蓉,伍星,毛剑琳,等.融入概率学习的混合DE求解绿色分布式可重入作业车间调度[J/OL]. 控制理论与应用:1-10[2024-04-21]. http://kns.cnki.net/kcms/detail/44.1240.TP.20240301.0841.006.html.
[3] 罗哲,朱光宇,杨志锋,等.多策略相结合粒子群算法求解作业车间调度问题[J/OL]. 计算机集成制造系统:1-24[2024-04-21]. https://doi.org/10.13196/j.cims.2023.0611.
[4] 黄洋鹏,李玲玲,李丽.基于改进双档案多目标进化算法的柔性作业车间批量流混排调度[J/OL]. 计算机应用研究:1-11[2024-04-21]. https://doi.org/10.19734/j.issn.1001-3695.2023.09.0499.
[5] 史双元,熊禾根.考虑外协的作业车间无拖期调度问题多目标差分进化算法[J/OL]. 计算机集成制造系统:1-27[2024-04-21]. https://doi.org/10.13196/j.cims.2023.0550.
[6] 常大亮,史海波,刘昶.具有紧时高能耗工序特征的多目标调度优化问题求解[J/OL]. 中国机械工程:1-12[2024-04-25]. http://kns.cnki.net/kcms/detail/42.1294.th.20240308.1647.008.html.
[7] 轩华,蔡舒跃,李冰.改进遗传禁忌算法求解含恶化效应和多时间约束的柔性作业车间调度[J/OL]. 工业工程与管理:1-19[2024-04-25]. http://kns.cnki.net/kcms/detail/31.1738.T.20240226.1806.013.html.
[8] WEMELSFELDER M.Approximating optimal solutions for Job Shop Scheduling Problems with unrelated machines in parallel using generalizable deep Multi-Agent Reinforcement Learning[D]. [S.l.] :University of Amsterdam, 2020.
[9] WU Z, FAN H, SUN Y, et al.Efficient multi-objective optimization on dynamic flexible job shop scheduling using deep reinforcement learning approach[J]. Processes, 2023, 11(7): 2018.
[10] WANG H, CHENG J, LIU C, et al.Multi-objective reinforcement learning framework for dynamic flexible job shop scheduling problem with uncertain events[J]. Applied Soft Computing, 2022,131:109717.
[11] CHANG J, YU D, HU Y, et al.Deep reinforcement learning for dynamic flexible job shop scheduling with random job arrival[J]. Processes,2022,10(4):760.
[12] CHANG J, YU D, ZHOU Z, et al.Hierarchical reinforcement learning for multi-objective real-time flexible scheduling in a smart shop floor[J]. Machines, 2022,10(12):1195.
[13] 陈勇,王昊天,易文超,等.基于元胞机与强化学习的多扰动车间调度算法[J]. 计算机集成制造系统, 2021,27(12):3536-3549.DOI:10.13196/j.cims.2021.12.015.
[14] XU K, YE C, GONG H, et al.Reinforcement Learning-Based Multi-Objective of Two-Stage Blocking Hybrid Flow Shop Scheduling Problem[J]. Processes,2023,12(1):51.
[15] LENG J, WANG X, WU S, et al.A multi-objective reinforcement learning approach for resequencing scheduling problems in automotive manufacturing systems[J]. International Journal of Production Research,2023,61(15):5156-5175.
[16] ZENG Z, LI X, BAI C.A Deep Reinforcement Learning Approach to Flexible Job Shop Scheduling[C]//2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC).[S.l.] : IEEE, 2022:884-890.
[17] WU X, YAN X.A spatial pyramid pooling-based deep reinforcement learning model for dynamic job-shop scheduling problem[J]. Computers & Operations Research, 2023,160:106401.
[18] PALOMBARINI J A, MARTÍNEZ E C.End-to-end on-line rescheduling from Gantt chart images using deep reinforcement learning[J]. International Journal of Production Research, 2022,60(14):4434-4463.
[19] SZEGEDY C, LIU W, JIA Y, et al.Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.[S.l.] :[s.n.] ,2015:1-9.
[20] WU X, YAN X, GUAN D, et al.A deep reinforcement learning model for dynamic job-shop scheduling problem with uncertain processing time[J]. Engineering Applications of Artificial Intelligence, 2024,131:107790.
[21] ZHANG C, SONG W, CAO Z, et al.Learning to dispatch for job shop scheduling via deep reinforcement learning[J]. Advances in Neural Information Processing Systems, 2020,33:1621-1632.
[22] HAN B A, YANG J J.Research on adaptive job shop scheduling problems based on dueling double DQN[J]. Ieee Access, 2020,8:186474-186495.
[23] LUO Y, DONG K, ZHAO L, et al.Balance between efficient and effective learning: Dense2sparse reward shaping for robot manipulation with environment uncertainty[J]. arXiv preprint arXiv,2020,2003:02740.
[24] LIN M, CHEN Q, YAN S.Network in network[J]. arXiv preprint arXiv,2013,1312:4400.
[25] CHEN L, LU K, RAJESWARAN A, et al.Decision transformer: Reinforcement learning via sequence modeling[J]. Advances in neural information processing systems,2021,34:15084-15097.
[26] SHANG J, KAHATAPITIYA K, LI X, et al.Starformer: Transformer with state-action-reward representations for visual reinforcement learning[C]//European conference on computer vision.Cham: Springer Nature Switzerland,2022:462-479.