Palletizing robot trajectory planning based on A-TD3

doi:10.16731/j.cnki.1671-3133.2025.05.006

Abstract

Abstract: The application of deep reinforcement learning algorithms in palletizing robotic arm trajectory planning suffer from slow learning rate and poor robustness. To address the above problems,a Twin Delayed Deep Deterministic policy gradient (TD3) algorithm based on improved Azimuthal reward function (A) is proposed for trajectory planning of robotic arm. First,the mathematical model of the palletizing robot is established in Cartesian coordinate system and its kinematic analysis is carried out. Second,for the problems of slow learning rate and poor robustness,based on the relative directions and positions of the robotic arm and the obstacles,an improved Azimuthal reward function combined with Twin Delayed Deep Deterministic policy gradient (A-TD3) algorithm is designed for the palletizing robotic arm trajectory planning,which enhances the robotic arm target oriented search,and improves the learning efficiency and robustness. Simulation results show that compared with the TD3 algorithm,the average convergence speed of A-TD3 algorithm is improved by 11.84 %,the average reward value is improved by 4.64 %,the average extreme deviation is decreased by 10.30 %,and the trajectory planning time is lower than that of the mainstream RRT and GA algorithms,which verifies the effectiveness of the A-TD3 algorithm in the application of palletizing robotic arm trajectory planning.

Key words: robotic arm, deep reinforcement learning, improved Azimuthal reward function (A), Twin Delayed Deep Deterministic policy gradient (TD3), trajectory planning

CLC Number:

TP249

JIN Qiao, YANG Guangrui, WANG Xiao, XU Linghua, ZHANG Fang. Palletizing robot trajectory planning based on A-TD3[J]. Modern Manufacturing Engineering, 2025, 536(5): 42-52.

References

[1] 郭勇,赖广.工业机器人关节空间轨迹规划及优化研究综述[J].机械传动,2020,44(2):154-165.
[2] 董理,杨东,鹿建森.工业机器人轨迹规划方法综述[J].控制工程,2022,29(12):2365-2374.
[3] ZHAN F,XIA R F,CHEN X X. An optimal trajectory planning algorithm for autonomous trucks:architecture,algorithm,and experiment[J]. International Journal of Advanced Robotic Systems,2020,17(2):429-434.
[4] CHENG S B,YANG G C,JIN M H,et al. A novel strategy for a 7-DOF space manipulator transferring a captured target with collision avoidance[J]. Advances in Space Research,2024,73 (12):6255-6273.
[5] CHEN T,WANG Y K,WEN H,et al. Autonomous assembly of multiple flexible spacecraft using RRT^* algorithm and input shaping technique[J].Nonlinear Dynamics,2023,111(12):11223-11241.
[6] ZHU Z X,YIN Y,LYU H G. Automatic collision avoidance algorithm based on route-plan-guided artificial potential field method[J]. Ocean Engineering,2023,271:113737.
[7] GUPTA P,PRATIHAR D K,DEB K. Analysis and optimization of gait cycle of 25-DOF NAO robot using particle swarm optimization and genetic algorithms[J]. International Journal of Humanoid Robotics,2023,21 (2):2350011.
[8] CHEN Y Q,GUO J L,YANG H D,et al. Research on navigation of bidirectional A^* algorithm based on ant colony algorithm[J].The Journal of Supercomputing,2020,77(2):1-18.
[9] AKDAG M,PEDERSEN T A,FOSSEN T I,et al. A decision support system for autonomous ship trajectory planning[J]. Ocean Engineering,2024,292:116562.
[10] GONG S M,MENG W,BO G,et al. Bayesian optimization enhanced deep reinforcement learning for trajectory planning and network formation in multi-UAV networks[J]. IEEE Transactions on Vehicular Technology,2023,72(8):10933-10948.
[11] PALACIOS M E,INCA S,MONSERRAT J F. Multipath planning acceleration method with double deep r-learning based on a genetic algorithm[J]. IEEE Transactions on Vehicular Technology,2023,72(10):12681-12696.
[12] LEE H T,KIM M K. Optimal path planning for a ship in coastal waters with deep Q network[J]. Ocean Engineering,2024,307:118193.
[13] XUE D L,WU D F,YAMASHITA A S,et al. Proximal policy optimization with reciprocal velocity obstacle based collision avoidance path planning for multi-unmanned surface vehicles[J]. Ocean Engineering,2023,273:114005.
[14] ZHANG K X,RUAN J G,LI T Y,et al. The effects investigation of data-driven fitting cycle and deep deterministic policy gradient algorithm on energy management strategy of dual-motor electric bus[J]. Energy,2023,269:126760.
[15] YANG Y,LI J T,PENG L L. Multi-robot path planning based on a deep reinforcement learning DQN algorithm[J]. CAAI Transactions on Intelligence Technology,2020,5(3):177-183.
[16] 胡晓东,张宽,谢圆,等.“嫦娥五号”月面采样机械臂路径规划[J].深空探测学报(中英文),2021,8(6):564-571.
[17] JIN X,WANG Z X. Proximal policy optimization based dynamic path planning algorithm for mobile robots[J]. Electronics Letters,2021,58 (1):13-15.
[18] LIN J F,HAN Y,GAO C Y,et al. Intelligent ship anti-rolling control system based on a deep deterministic policy gradient algorithm and the Magnus effect[J]. Physics of Fluids,2022,34(5):1-10.
[19] ZHOU C,WANG Y T,WANG L,et al. Obstacle avoidance strategy for an autonomous surface vessel based on modified deep deterministic policy gradient[J]. Ocean Engineering,2022,243:110166.
[20] YU J M,SUN H,SUN J Q. Improved twin delayed deep deterministic policy gradient algorithm based real-time trajectory planning for parafoil under complicated constraints[J]. Applied Sciences,2022,12(16):8189.
[21] 李跃,邵振洲,赵振东,等.面向轨迹规划的深度强化学习奖励函数设计[J].计算机工程与应用,2020,56(2):226-232.
[22] WANG H J,GAO W,WANG Z,et al. Research on obstacle avoidance planning for UUV based on A3C algorithm[J]. Journal of Marine Science and Engineering,2023,12(1):63.
[23] GAO Q,LIU Y B,ZHAO J B,et al. Hybrid deep learning for dynamic total transfer capability control[J]. IEEE Transactions on Power Systems,2021,36(3):2733-2736.
[24] LILLICRAP T P,HUNT J J,PRITZEL A,et al. Continuous control with deep reinforcement learning[J]. Arxiv Preprint Arxiv,2015,25:1-10.
[25] TAN Y Q,SHEN Y X,YU X Y,et al. Low carbon economic dispatch of the combined heat and powervirtual power plants:a improved deep reinforcement learningbased approach[J]. IET Renewable Power Generation,2022,17(4):982-1007.
[26] ZHENG Q Y,TIAN Y,DENG Y,et al. Reinforcement learning-based control of single-track two-wheeled robots in narrow terrain[J]. Actuators,2023,12(3):109.
[27] CHEN S G,TANG B,WANG K. Twin delayed deep deterministic policy gradient-based intelligent computation offloading for IoT[J]. Digital Communications and Networks,2023,9(4):836-845.
[28] LI J T,ZHANG T X,LIU K. Memory-Enhanced Twin Delayed Deep Deterministic Policy Gradient (ME-TD3)-based unmanned combat aerial vehicle trajectory planning for avoiding radar detection threats in dynamic and unknown environments[J]. Remote Sensing,2023,15 (23):5494.
[29] 江安旎,杜煜,原颖,等.基于GA-TD3算法的交叉路口决策模型[J].计算机应用研究,2024,41(7):1-7.
[30] ZHOU Y T,KONG X R,LIN K P,et al. Novel task decomposed multi-agent twin delayed deep deterministic policy gradient algorithm for multi-UAV autonomous path planning[J]. Knowledge-Based Systems,2024,287:111462.
[31] 杨淑华,谢晓波,邴振凯,等.基于HER-TD3算法的青皮核桃采摘机械臂路径规划[J].农业机械学报,2024,55(4):113-123.
[32] HU Y,CAO N,LU H,et al. Multi-dimensional resource management with deep deterministic policy gradient for digital twin-enabled industrial internet of things in 6 generation[J]. Transactions on Emerging Telecommunications Technologies,2024,35(4):e4962.
[33] 李亚,王卫岗,张原,等.基于改进型TD3算法的车载边缘计算任务卸载决策[J].电子测量技术,2024,47(6):64-70.
[34] CAI R G,LI X. Path planning method for manipulators based on improved twin delayed deep deterministic policy gradient and RRT^*[J]. Applied Sciences,2024,14(7):2765.