基于A-TD3的码垛机器人轨迹规划*

doi:10.16731/j.cnki.1671-3133.2025.05.006

现代制造工程 ›› 2025, Vol. 536 ›› Issue (5): 42-52.doi: 10.16731/j.cnki.1671-3133.2025.05.006

基于A-TD3的码垛机器人轨迹规划^*

金桥¹, 杨光锐², 王霄¹, 徐凌桦¹, 张芳¹

1 贵州大学电气工程学院,贵阳 550025;
2 北京达特集成技术有限责任公司,北京 100176

收稿日期:2024-08-23 出版日期:2025-05-18 发布日期:2025-05-30
通讯作者: 王霄,博士,副教授,主要研究方向为物联网理论及应用、人工智能理论及应用。E-mail:xwang9@gzu.edu.cn
作者简介:金桥,硕士研究生,主要研究方向为机器人轨迹规划。杨光锐,学士,工程师,主要研究方向为物流自动化系统、智能生产系统。徐凌桦,博士,副教授,主要研究方向为嵌入式系统、传感器网络、工业自动化及智能控制。张芳,硕士研究生,主要研究方向为机器视觉。
基金资助:
*国家自然科学基金资助项目(61861007,61640014);贵州省科技计划资助项目(黔科合基础-ZK[2021]一般303);贵州省科技支撑计划资助项目(黔科合支撑[2022]一般017,黔科合支撑[2022]一般264,黔科合支撑[2023]一般096,黔科合支撑[2023]一般412,黔科合支撑[2023]一般409);贵州省教育厅创新群体项目(黔教合KY字[2021]012);中国电力建设股份有限公司科技项目(DJ-ZDXM-2022-44);贵大引进人才项目(贵大人基合字(2014)08号)

Palletizing robot trajectory planning based on A-TD3

JIN Qiao¹, YANG Guangrui², WANG Xiao¹, XU Linghua¹, ZHANG Fang¹

1 School of Electrical Engineering,Guizhou University,Guiyang 550025,China;
2 Beijing Dart Integrated Technology Co.,Ltd.,Beijing 100176,China

Received:2024-08-23 Online:2025-05-18 Published:2025-05-30

摘要/Abstract

摘要： 深度强化学习算法在码垛机器人机械臂轨迹规划的应用中存在学习速率低和鲁棒性差的问题。针对以上问题,提出了一种基于改进方位奖励函数(improved Azimuthal reward function,A)的双延迟深度确定性策略梯度(Twin Delayed Deep Deterministic policy gradient,TD3)算法用于机械臂的轨迹规划。首先,在笛卡尔坐标系下建立码垛机器人的数学模型,并对其进行运动学分析;其次,针对学习速率低和鲁棒性差的问题,基于机械臂和障碍物的相对方向和位置,设计了一种改进方位奖励函数结合双延迟深度确定性策略梯度(A-TD3)算法用于码垛机器人机械臂轨迹规划,以增强机械臂目标搜索的导向性,提高学习效率和鲁棒性。仿真结果表明,相比于改进前TD3算法,A-TD3算法平均收敛速度提升了11.84 %,平均奖励值提升了4.64 %,平均极差下降了10.30 %,在轨迹规划用时上也比主流RRT和GA算法短,验证了A-TD3算法在码垛机器人机械臂轨迹规划应用中的有效性。

关键词: 机械臂, 深度强化学习, 改进方位奖励函数, 双延迟深度确定性策略梯度, 轨迹规划

Abstract: The application of deep reinforcement learning algorithms in palletizing robotic arm trajectory planning suffer from slow learning rate and poor robustness. To address the above problems,a Twin Delayed Deep Deterministic policy gradient (TD3) algorithm based on improved Azimuthal reward function (A) is proposed for trajectory planning of robotic arm. First,the mathematical model of the palletizing robot is established in Cartesian coordinate system and its kinematic analysis is carried out. Second,for the problems of slow learning rate and poor robustness,based on the relative directions and positions of the robotic arm and the obstacles,an improved Azimuthal reward function combined with Twin Delayed Deep Deterministic policy gradient (A-TD3) algorithm is designed for the palletizing robotic arm trajectory planning,which enhances the robotic arm target oriented search,and improves the learning efficiency and robustness. Simulation results show that compared with the TD3 algorithm,the average convergence speed of A-TD3 algorithm is improved by 11.84 %,the average reward value is improved by 4.64 %,the average extreme deviation is decreased by 10.30 %,and the trajectory planning time is lower than that of the mainstream RRT and GA algorithms,which verifies the effectiveness of the A-TD3 algorithm in the application of palletizing robotic arm trajectory planning.

Key words: robotic arm, deep reinforcement learning, improved Azimuthal reward function (A), Twin Delayed Deep Deterministic policy gradient (TD3), trajectory planning

中图分类号:

TP249

金桥, 杨光锐, 王霄, 徐凌桦, 张芳. 基于A-TD3的码垛机器人轨迹规划^*[J]. 现代制造工程, 2025, 536(5): 42-52.

JIN Qiao, YANG Guangrui, WANG Xiao, XU Linghua, ZHANG Fang. Palletizing robot trajectory planning based on A-TD3[J]. Modern Manufacturing Engineering, 2025, 536(5): 42-52.

参考文献

[1] 郭勇,赖广.工业机器人关节空间轨迹规划及优化研究综述[J].机械传动,2020,44(2):154-165.
[2] 董理,杨东,鹿建森.工业机器人轨迹规划方法综述[J].控制工程,2022,29(12):2365-2374.
[3] ZHAN F,XIA R F,CHEN X X. An optimal trajectory planning algorithm for autonomous trucks:architecture,algorithm,and experiment[J]. International Journal of Advanced Robotic Systems,2020,17(2):429-434.
[4] CHENG S B,YANG G C,JIN M H,et al. A novel strategy for a 7-DOF space manipulator transferring a captured target with collision avoidance[J]. Advances in Space Research,2024,73 (12):6255-6273.
[5] CHEN T,WANG Y K,WEN H,et al. Autonomous assembly of multiple flexible spacecraft using RRT^* algorithm and input shaping technique[J].Nonlinear Dynamics,2023,111(12):11223-11241.
[6] ZHU Z X,YIN Y,LYU H G. Automatic collision avoidance algorithm based on route-plan-guided artificial potential field method[J]. Ocean Engineering,2023,271:113737.
[7] GUPTA P,PRATIHAR D K,DEB K. Analysis and optimization of gait cycle of 25-DOF NAO robot using particle swarm optimization and genetic algorithms[J]. International Journal of Humanoid Robotics,2023,21 (2):2350011.
[8] CHEN Y Q,GUO J L,YANG H D,et al. Research on navigation of bidirectional A^* algorithm based on ant colony algorithm[J].The Journal of Supercomputing,2020,77(2):1-18.
[9] AKDAG M,PEDERSEN T A,FOSSEN T I,et al. A decision support system for autonomous ship trajectory planning[J]. Ocean Engineering,2024,292:116562.
[10] GONG S M,MENG W,BO G,et al. Bayesian optimization enhanced deep reinforcement learning for trajectory planning and network formation in multi-UAV networks[J]. IEEE Transactions on Vehicular Technology,2023,72(8):10933-10948.
[11] PALACIOS M E,INCA S,MONSERRAT J F. Multipath planning acceleration method with double deep r-learning based on a genetic algorithm[J]. IEEE Transactions on Vehicular Technology,2023,72(10):12681-12696.
[12] LEE H T,KIM M K. Optimal path planning for a ship in coastal waters with deep Q network[J]. Ocean Engineering,2024,307:118193.
[13] XUE D L,WU D F,YAMASHITA A S,et al. Proximal policy optimization with reciprocal velocity obstacle based collision avoidance path planning for multi-unmanned surface vehicles[J]. Ocean Engineering,2023,273:114005.
[14] ZHANG K X,RUAN J G,LI T Y,et al. The effects investigation of data-driven fitting cycle and deep deterministic policy gradient algorithm on energy management strategy of dual-motor electric bus[J]. Energy,2023,269:126760.
[15] YANG Y,LI J T,PENG L L. Multi-robot path planning based on a deep reinforcement learning DQN algorithm[J]. CAAI Transactions on Intelligence Technology,2020,5(3):177-183.
[16] 胡晓东,张宽,谢圆,等.“嫦娥五号”月面采样机械臂路径规划[J].深空探测学报(中英文),2021,8(6):564-571.
[17] JIN X,WANG Z X. Proximal policy optimization based dynamic path planning algorithm for mobile robots[J]. Electronics Letters,2021,58 (1):13-15.
[18] LIN J F,HAN Y,GAO C Y,et al. Intelligent ship anti-rolling control system based on a deep deterministic policy gradient algorithm and the Magnus effect[J]. Physics of Fluids,2022,34(5):1-10.
[19] ZHOU C,WANG Y T,WANG L,et al. Obstacle avoidance strategy for an autonomous surface vessel based on modified deep deterministic policy gradient[J]. Ocean Engineering,2022,243:110166.
[20] YU J M,SUN H,SUN J Q. Improved twin delayed deep deterministic policy gradient algorithm based real-time trajectory planning for parafoil under complicated constraints[J]. Applied Sciences,2022,12(16):8189.
[21] 李跃,邵振洲,赵振东,等.面向轨迹规划的深度强化学习奖励函数设计[J].计算机工程与应用,2020,56(2):226-232.
[22] WANG H J,GAO W,WANG Z,et al. Research on obstacle avoidance planning for UUV based on A3C algorithm[J]. Journal of Marine Science and Engineering,2023,12(1):63.
[23] GAO Q,LIU Y B,ZHAO J B,et al. Hybrid deep learning for dynamic total transfer capability control[J]. IEEE Transactions on Power Systems,2021,36(3):2733-2736.
[24] LILLICRAP T P,HUNT J J,PRITZEL A,et al. Continuous control with deep reinforcement learning[J]. Arxiv Preprint Arxiv,2015,25:1-10.
[25] TAN Y Q,SHEN Y X,YU X Y,et al. Low carbon economic dispatch of the combined heat and powervirtual power plants:a improved deep reinforcement learningbased approach[J]. IET Renewable Power Generation,2022,17(4):982-1007.
[26] ZHENG Q Y,TIAN Y,DENG Y,et al. Reinforcement learning-based control of single-track two-wheeled robots in narrow terrain[J]. Actuators,2023,12(3):109.
[27] CHEN S G,TANG B,WANG K. Twin delayed deep deterministic policy gradient-based intelligent computation offloading for IoT[J]. Digital Communications and Networks,2023,9(4):836-845.
[28] LI J T,ZHANG T X,LIU K. Memory-Enhanced Twin Delayed Deep Deterministic Policy Gradient (ME-TD3)-based unmanned combat aerial vehicle trajectory planning for avoiding radar detection threats in dynamic and unknown environments[J]. Remote Sensing,2023,15 (23):5494.
[29] 江安旎,杜煜,原颖,等.基于GA-TD3算法的交叉路口决策模型[J].计算机应用研究,2024,41(7):1-7.
[30] ZHOU Y T,KONG X R,LIN K P,et al. Novel task decomposed multi-agent twin delayed deep deterministic policy gradient algorithm for multi-UAV autonomous path planning[J]. Knowledge-Based Systems,2024,287:111462.
[31] 杨淑华,谢晓波,邴振凯,等.基于HER-TD3算法的青皮核桃采摘机械臂路径规划[J].农业机械学报,2024,55(4):113-123.
[32] HU Y,CAO N,LU H,et al. Multi-dimensional resource management with deep deterministic policy gradient for digital twin-enabled industrial internet of things in 6 generation[J]. Transactions on Emerging Telecommunications Technologies,2024,35(4):e4962.
[33] 李亚,王卫岗,张原,等.基于改进型TD3算法的车载边缘计算任务卸载决策[J].电子测量技术,2024,47(6):64-70.
[34] CAI R G,LI X. Path planning method for manipulators based on improved twin delayed deep deterministic policy gradient and RRT^*[J]. Applied Sciences,2024,14(7):2765.

基于A-TD3的码垛机器人轨迹规划^*

Palletizing robot trajectory planning based on A-TD3

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	张宁宁, 万卫兵, 张梦晓, 赵宇明. 面向多目标动态作业车间调度的强化学习决策算法研究^*[J]. 现代制造工程, 2025, 538(7): 20-30.
[2]	黄开启, 邹秀梅, 刘正超. 基于SHUK-SVSF算法的轴孔装配孔位姿估计^*[J]. 现代制造工程, 2025, 538(7): 105-112.
[3]	景会成, 张冰珂, 张靖轩, 郭明亮, 孙晋超. 基于改进粒子群算法的焊接机械臂轨迹规划方法^*[J]. 现代制造工程, 2025, 537(6): 67-72.
[4]	马黎, 张迪. 基于QP-ZNN的冗余度机械臂容错控制^*[J]. 现代制造工程, 2025, 537(6): 73-83.
[5]	李建儒, 龚堰珏, 赵罘. 基于混合遗传粒子群算法的机器人关节空间轨迹规划^*[J]. 现代制造工程, 2025, 537(6): 84-91.
[6]	曹胜杰, 吴海, 程壹涛, 任泽生. 执行器故障下机械臂有限时间容错控制[J]. 现代制造工程, 2025, 536(5): 82-90.
[7]	徐帅, 李艳武, 谢辉, 牛晓伟. 基于卷积金字塔网络的PPO算法求解作业车间调度问题^*[J]. 现代制造工程, 2025, 534(3): 19-30.
[8]	刘宇, 张磊, 邵建根, 刘海涛, 顾逢平, 章悦. 一种可伸缩旋臂的楼梯清洁机器人及其控制设计^*[J]. 现代制造工程, 2025, 534(3): 60-68.
[9]	许家伟, 李磊, 汪建华, 张雅君, 覃杰伟, 刘旭珍. 基于TCSPSO算法的机械臂运动时间最优轨迹规划^*[J]. 现代制造工程, 2025, 534(3): 69-76.
[10]	朱敏, 陈思源, 陈杰. 基于改进APF引导的双向RRT机械臂路径^*[J]. 现代制造工程, 2025, 533(2): 1-9.
[11]	杨丹, 舒先涛, 余震, 鲁光涛, 纪松霖, 王家兵. 深度强化学习求解动态柔性作业车间调度问题^*[J]. 现代制造工程, 2025, 533(2): 10-16.
[12]	杨逢海, 杨晓英, 裴志杰, 武亚琪, 张志伟. 基于深度强化学习的风电拉挤板生产智能排程^*[J]. 现代制造工程, 2025, 532(1): 23-32.
[13]	谢子健, 秦建军, 曹钰. 基于改进TD3的四足机器人非结构化地形运动控制^*[J]. 现代制造工程, 2025, 532(1): 33-41.
[14]	邓鹏, 唐文涛, 黄开明. 基于级联融合网络的密集型抓取位姿检测^*[J]. 现代制造工程, 2024, 526(7): 126-134.
[15]	贾英霞, 王东辉. 基于自适应神经网络的工业机器人双臂协同鲁棒控制^*[J]. 现代制造工程, 2024, 525(6): 61-68.