现代制造工程 ›› 2026, Vol. 548 ›› Issue (5): 18-30.doi: 10.16731/j.cnki.1671-3133.2026.05.003

• 先进制造系统管理运作 • 上一篇    下一篇

基于深度强化学习的模块化集成建造车间实时调度方法研究*

樊一, 刘斯麒, 沈洌政, 朱海平   

  1. 华中科技大学机械科学与工程学院,武汉 430074
  • 收稿日期:2025-04-28 出版日期:2026-05-18 发布日期:2026-06-04
  • 通讯作者: 朱海平,博士,教授,主要研究方向为生产系统建模与优化、工业大数据、智能制造和数字化车间应用等。E-mail:haipzhu@hust.edu.cn
  • 作者简介:樊一,硕士研究生,主要研究方向为智能优化调度。刘斯麒,博士研究生,主要研究方向为智能调度优化。沈洌政,博士研究生,主要研究方向为生产系统建模优化。E-mail:m202370728@hust.edu.cn;d202480324@hust.edu.cn;lzshen@hust.edu.cn
  • 基金资助:
    *国家重点研发计划项目(2023YFB3307900)

Research on real-time scheduling method of modular integrated construction workshop based on deep reinforcement learning

FAN Yi, LIU Siqi, SHEN Liezheng, ZHU Haiping   

  1. School of Mechanical Science & Engineering,Huazhong University of Science and Technology, Wuhan 430074,China
  • Received:2025-04-28 Online:2026-05-18 Published:2026-06-04

摘要: 模块化集成建造(Modular Integrated Construction,MIC)是一种新兴的建造模式,目前被广泛运用于建筑构件的生产制造。由于构件产品个性化需求日益突出,且建造车间生产环境复杂多变,因此亟需设计先进实时调度方法应对新的生产模式以及响应动态事件。基于此,针对模块化集成建造车间调度问题,提出了一种基于深度强化学习(Deep Reinforcement Learning,DRL)的实时调度方法。首先,分析模块化集成建造车间的生产流程与特性,将其抽象为一个具有混流生产特性的生产车间并完成相关数学模型的构建;其次,通过定义生产时间序列中的调度节点,将调度问题建模为马尔可夫决策过程(Markov Decision Process,MDP);随后,依次设计具有21个生产特征的通用状态空间、8个基于遗传规划(Genetic Programming,GP)复合规则的动作空间和奖励函数,基于此,提出一种基于双记忆池的近端策略优化(Proximal Policy Optimization with Dual Memory Pools,PPO-DMP)算法来训练调度智能体,以实现生产状态与调度规则间的高效映射,从而实现调度目标的有效优化;最后,通过对比实验证明所提出的实时调度方法相较于传统方法具有良好的调度性与动态性,尤其是在应对新订单插入的场景中,其优势更加显著。

关键词: 深度强化学习, 模块化集成建造, 实时调度, 马尔可夫决策过程

Abstract: Modular Integrated Construction (MIC) represents an emerging construction paradigm that has gained widespread adoption in the production of building components. Given the growing demand for customized component products and the intricate,dynamic nature of the construction workshop environment,there is an urgent need to develop advanced real-time scheduling methodologies capable of adapting to novel production modes and responding effectively to dynamic events. A real-time scheduling approach based on Deep Reinforcement Learning (DRL) was proposed for modular integrated construction shop scheduling. First,the production process and characteristics of the modular integrated construction workshop were systematically analyzed,abstracted as a hybrid-flow production system,and formalized through a relevant mathematical model. Second,by defining scheduling decision points within the production time series,the scheduling problem was formulated as a Markov Decision Process (MDP). Subsequently,a comprehensive state space encompassing 21 production features,8 action spaces,and reward functions derived from Genetic Programming (GP) complex rules were sequentially designed. Building on this foundation,an algorithm based on Proximal Policy Optimization with Dual Memory Pools (PPO-DMP) was proposed to train scheduling agents,enabling efficient mapping between production states and scheduling strategies,thereby achieving effective optimization of scheduling objectives. Finally,comparative experiments demonstrate that the proposed real-time scheduling algorithm exhibits superior scheduling efficiency and dynamic adaptability compared to traditional methods,particularly in scenarios involving new order insertions,where its advantages become even more pronounced.

Key words: deep reinforcement learning, modular integrated construction, real-time scheduling, Markov decision process

中图分类号: 

版权所有 © 《现代制造工程》编辑部 
地址:北京市东城区东四块玉南街28号 邮编:100061 电话:010-67126028 电子信箱:2645173083@qq.com
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn