现代制造工程 ›› 2025, Vol. 538 ›› Issue (7): 20-30.doi: 10.16731/j.cnki.1671-3133.2025.07.003

• 先进制造系统管理运作 • 上一篇    下一篇

面向多目标动态作业车间调度的强化学习决策算法研究*

张宁宁1, 万卫兵1, 张梦晓1, 赵宇明2   

  1. 1 上海工程技术大学电子电气工程学院,上海 201620;
    2 上海交通大学自动化系,上海 201100
  • 收稿日期:2024-06-11 出版日期:2025-07-18 发布日期:2025-08-04
  • 通讯作者: 万卫兵,副教授,硕士生导师,博士,主要研究方向为人工智能。E-mail:wbwan@sues.edu.cn
  • 作者简介:张宁宁,硕士研究生,主要研究方向为智能制造、深度强化学习。张梦晓,硕士研究生,主要研究方向为人工智能。E-mail:m020220118@sues.edu.cn; 赵宇明,副教授,硕士生导师,博士,主要研究方向为图像处理、模式识别和计算机视觉。E-mail:mx_zhang@ sues.edu.cn
  • 基金资助:
    *科技部科技创新2030——“新一代人工智能”重大项目(2020AAA0109300)

Research on reinforcement learning decision algorithm for multi-objective dynamic job shop scheduling

ZHANG Ningning1, WAN Weibing1, ZHANG Mengxiao1, ZHAO Yuming2   

  1. 1 School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China;
    2 The Department of Automation,Shanghai Jiao Tong University, Shanghai 201100, China
  • Received:2024-06-11 Online:2025-07-18 Published:2025-08-04

摘要: 为求解多目标动态作业车间调度问题,在调度环境规模可变的情况下满足制造车间的实时调度需求,提出一种将近端策略优化(Proximal Policy Optimization,PPO)与GoogLeNet结合的方法即GLN-PPO。使用多维矩阵构造调度问题的状态空间,设计基于多种优先级规则的动作空间以及多目标奖励函数。为验证所提算法的有效性,分别在基于公共算例的静态公共环境、基于实际算例的静态实际环境和动态实际环境中训练并测试算法的性能。实验结果表明,与遗传算法相比,GLN-PPO能够提供高质量调度结果,满足企业的实时调度要求并且能够灵活应对规模可变的调度环境。

关键词: 深度强化学习, 作业车间调度, GoogLeNet, 近端策略优化

Abstract: To address the multi-objective dynamic job shop scheduling problem and meet the real-time scheduling needs of manufacturing workshops in environments with variable scales, a method combining Proximal Policy Optimization (PPO) with GoogLeNet, named GLN-PPO, is proposed. This method constructs the state space of the scheduling problem using multidimensional matrices, designs an action space based on various priority rules, and devises a multi-objective reward function. To verify the effectiveness of the proposed algorithm, it is trained and tested in three environments: a static public environment based on common benchmark problems, a static real environment based on actual cases, and a dynamic real environment. Experimental results show that compared to genetic algorithms, GLN-PPO can provide high-quality scheduling results, meet the real-time scheduling requirements of enterprises, and adapt flexibly to environments with variable scales.

Key words: deep reinforcement learning, job shop scheduling, GoogLeNet, Proximal Policy Optimization (PPO)

中图分类号: 

版权所有 © 《现代制造工程》编辑部 
地址:北京市东城区东四块玉南街28号 邮编:100061 电话:010-67126028 电子信箱:2645173083@qq.com
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn