现代制造工程 ›› 2025, Vol. 536 ›› Issue (5): 42-52.doi: 10.16731/j.cnki.1671-3133.2025.05.006

• 机器人技术 • 上一篇    下一篇

基于A-TD3的码垛机器人轨迹规划*

金桥1, 杨光锐2, 王霄1, 徐凌桦1, 张芳1   

  1. 1 贵州大学电气工程学院,贵阳 550025;
    2 北京达特集成技术有限责任公司,北京 100176
  • 收稿日期:2024-08-23 出版日期:2025-05-18 发布日期:2025-05-30
  • 通讯作者: 王霄,博士,副教授,主要研究方向为物联网理论及应用、人工智能理论及应用。E-mail:xwang9@gzu.edu.cn
  • 作者简介:金桥,硕士研究生,主要研究方向为机器人轨迹规划。杨光锐,学士,工程师,主要研究方向为物流自动化系统、智能生产系统。徐凌桦,博士,副教授,主要研究方向为嵌入式系统、传感器网络、工业自动化及智能控制。张芳,硕士研究生,主要研究方向为机器视觉。
  • 基金资助:
    *国家自然科学基金资助项目(61861007,61640014);贵州省科技计划资助项目(黔科合基础-ZK[2021]一般303);贵州省科技支撑计划资助项目(黔科合支撑[2022]一般017,黔科合支撑[2022]一般264,黔科合支撑[2023]一般096,黔科合支撑[2023]一般412,黔科合支撑[2023]一般409);贵州省教育厅创新群体项目(黔教合KY字[2021]012);中国电力建设股份有限公司科技项目(DJ-ZDXM-2022-44);贵大引进人才项目(贵大人基合字(2014)08号)

Palletizing robot trajectory planning based on A-TD3

JIN Qiao1, YANG Guangrui2, WANG Xiao1, XU Linghua1, ZHANG Fang1   

  1. 1 School of Electrical Engineering,Guizhou University,Guiyang 550025,China;
    2 Beijing Dart Integrated Technology Co.,Ltd.,Beijing 100176,China
  • Received:2024-08-23 Online:2025-05-18 Published:2025-05-30

摘要: 深度强化学习算法在码垛机器人机械臂轨迹规划的应用中存在学习速率低和鲁棒性差的问题。针对以上问题,提出了一种基于改进方位奖励函数(improved Azimuthal reward function,A)的双延迟深度确定性策略梯度(Twin Delayed Deep Deterministic policy gradient,TD3)算法用于机械臂的轨迹规划。首先,在笛卡尔坐标系下建立码垛机器人的数学模型,并对其进行运动学分析;其次,针对学习速率低和鲁棒性差的问题,基于机械臂和障碍物的相对方向和位置,设计了一种改进方位奖励函数结合双延迟深度确定性策略梯度(A-TD3)算法用于码垛机器人机械臂轨迹规划,以增强机械臂目标搜索的导向性,提高学习效率和鲁棒性。仿真结果表明,相比于改进前TD3算法,A-TD3算法平均收敛速度提升了11.84 %,平均奖励值提升了4.64 %,平均极差下降了10.30 %,在轨迹规划用时上也比主流RRT和GA算法短,验证了A-TD3算法在码垛机器人机械臂轨迹规划应用中的有效性。

关键词: 机械臂, 深度强化学习, 改进方位奖励函数, 双延迟深度确定性策略梯度, 轨迹规划

Abstract: The application of deep reinforcement learning algorithms in palletizing robotic arm trajectory planning suffer from slow learning rate and poor robustness. To address the above problems,a Twin Delayed Deep Deterministic policy gradient (TD3) algorithm based on improved Azimuthal reward function (A) is proposed for trajectory planning of robotic arm. First,the mathematical model of the palletizing robot is established in Cartesian coordinate system and its kinematic analysis is carried out. Second,for the problems of slow learning rate and poor robustness,based on the relative directions and positions of the robotic arm and the obstacles,an improved Azimuthal reward function combined with Twin Delayed Deep Deterministic policy gradient (A-TD3) algorithm is designed for the palletizing robotic arm trajectory planning,which enhances the robotic arm target oriented search,and improves the learning efficiency and robustness. Simulation results show that compared with the TD3 algorithm,the average convergence speed of A-TD3 algorithm is improved by 11.84 %,the average reward value is improved by 4.64 %,the average extreme deviation is decreased by 10.30 %,and the trajectory planning time is lower than that of the mainstream RRT and GA algorithms,which verifies the effectiveness of the A-TD3 algorithm in the application of palletizing robotic arm trajectory planning.

Key words: robotic arm, deep reinforcement learning, improved Azimuthal reward function (A), Twin Delayed Deep Deterministic policy gradient (TD3), trajectory planning

中图分类号: 

版权所有 © 《现代制造工程》编辑部 
地址:北京市东城区东四块玉南街28号 邮编:100061 电话:010-67126028 电子信箱:2645173083@qq.com
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn