查看论文信息

题名：	复杂环境下无人机编队协同路径规划方法研究
作者：	周金梅
学号：	SZ2203023
保密级别：	公开
语种：	chi
学科代码：	085406
学科：	工学 - 电子信息 - 控制工程
学生类型：	硕士
学位：	工程硕士
入学年份：	2022
学校：	南京航空航天大学
院系：	自动化学院
专业：	电子信息（专业学位）
研究方向：	智能决策
导师姓名：	丁勇
导师单位：	自动化学院
完成日期：	2025-01-13
答辩日期：	2025-03-09
外文题名：	Research on Cooperative Path Planning of UAV Formation in Complex Environment
关键词：	无人机编队 ; 路径规划 ; 群体智能优化 ; 动态避障 ; 多智能体强化学习 ; 最优交互避碰
外文关键词：	UAV formation ; path planning ; swarm intelligence optimization ; dynamic obstacle avoidance ; MARL ; optimal reciprocal collision avoidance
摘要：	︿随着无人机技术的快速发展，多无人机系统在复杂动态环境中的应用逐渐成为研究热点。特别是多无人机系统在执行协同任务时，面临着环境的不确定性和动态变化，传统的路径规划方法往往存在一定局限性，难以满足全局最优、实时性等多重需求。本文围绕复杂环境下无人机编队协同路径规划问题展开研究，主要工作内容如下：针对时空约束下的多无人机协同路径规划问题，提出了一种基于多策略融合灰狼优化算法的协同路径规划方法。首先，建立了多无人机飞行环境，并结合航程长短、飞行速度等性能约束设计目标函数。其次，考虑到传统灰狼优化算法中初始化种群策略的盲目性和随机性，提出在初始化阶段融合贪婪优化与变异阈值的策略，确保种群多样性的同时提升个体适应度。然后，为平衡算法的全局搜索与局部优化能力，进一步对原始收敛因子进行了非线性改进。接着，引入动态加权规则，形成动态加权与静态平均相结合的位置更新方法，提升策略更新的适应性。最后，通过仿真实验验证算法的有效性和优越性。针对未知环境下无人机编队路径规划问题，提出了一种基于领航跟随和IAPF-EMADDPG的无人机编队路径规划方法。首先，根据任务场景设计基于领航跟随法和一致性理论的控制目标。其次，构建了基于长短时优先经验筛选机制的EMADDPG编队算法框架，引入LSTM网络保存历史训练信息，并提出优先级经验筛选机制，通过TD误差等重要性指标加权采样，生成高质量训练样本集以加快学习速度。然后，为解决稀疏奖励函数的问题，设计了基于势函数的密集奖励策略，并通过调整因子引导智能体跳出局部最优。最后，通过仿真实验表明该算法在未知环境中编队路径规划任务中的有效性。针对无人机编队路径规划中动态避障实时性不足的问题，提出了一种结合ORCA的改进MAAC无人机编队避障方法。首先，基于多智能体强化学习构建无人机编队飞行环境，定义了动作空间、状态空间及奖励函数。其次，在Critic网络中引入多头自注意力机制以提升策略学习效果。然后，为了增强无人机应对动态障碍的实时响应能力，引入ORCA避障模块用于算法初期引导无人机学习快速收敛至正奖励，并采用递进式双阶段训练方法，待奖励稳定后进一步优化路径。最后，通过仿真实验结果表明，该方法能够在复杂动态环境中有效提高实时路径规划的安全性和效率。﹀
外摘要要：	︿ With the rapid development of UAV technology, the application of multi-UAV systems in complex dynamic environments has gradually become a research hotspot. When performing collaborative tasks, multi-UAV systems face environmental uncertainty and dynamic changes. Traditional path planning methods often have certain limitations and are difficult to meet multiple requirements such as global optimization and real-time performance. Therefore, this paper focuses on the path planning problem of UAV formations in complex environments. The main work content is as follows: Aiming at the problem of multi-UAV collaborative path planning under spatiotemporal constraints, a collaborative path planning method based on multi-strategy fusion gray wolf optimization algorithm is proposed. First, a multi-UAV flight environment is established, and the objective function is designed in combination with performance constraints such as range length and flight speed. Second, considering the blindness and randomness of the initialization population strategy in the traditional gray wolf optimization algorithm, a strategy of integrating greedy optimization and mutation threshold in the initialization stage is proposed to ensure population diversity while improving individual fitness. Then, in order to balance the global exploration and local development capabilities of the algorithm, the linear convergence factor is further improved nonlinearly. Next, the dynamic weighted rule is introduced, and the method of combining dynamic weighted average with static average is adopted to improve the the flexibility and adaptability of strategy update. Finally, the effectiveness and superiority of the algorithm are verified through simulation experiments. Aiming at the problem of UAV formation path planning in unknown environment, a UAV formation path planning method based on leader-follower and IAPF-EMADDPG is proposed. Firstly, the control objectives based on leader-follower method and consistency theory are designed according to the mission scenario. Secondly, the EMADDPG formation algorithm framework is constructed using long-short time priority experience selection mechanism. The LSTM network is introduced to store historical training information, and the priority experience selection mechanism is proposed and weights samples based on importance indicators such as TD error to generate a high-quality training sample set, accelerating the learning process. Then, in order to solve the problem of sparse reward function, a dense reward strategy based on potential function is designed, and the agent is guided out of the local optimum by adjusting the factor. Finally, the effectiveness of the algorithm in the formation path planning task in unknown environment is demonstrated through simulation experiments. Aiming at the problem of insufficient real-time performance of dynamic obstacle avoidance in UAV formation path planning, an improved MAAC formation obstacle avoidance method combined with ORCA is proposed. Firstly, a UAV formation flight environment is constructed based on multi-agent reinforcement learning, and the action space, state space and reward function are defined. Secondly, a multi-head self-attention mechanism is introduced in the Critic network to improve the strategy learning effect. Then, in order to enhance the real-time response capability of the UAV to dynamic obstacles, the ORCA obstacle avoidance module is introduced to guide the UAV learning to converge quickly to positive rewards in the early stage of the algorithm. A progressive two-stage training method is employed, and once the rewards stabilize, the path is further optimized. Finally, the results of the simulation experiments demonstrate that the method can effectively improve the safety and efficiency of real-time path planning in complex dynamic environments. ﹀
参考文献：	︿ [1] Javed S, Hassan A, Ahmad R, et al. State-of-the-art and future research challenges in UAV swarms[J]. IEEE Internet of Things Journal, 2024, 11(11): 19023-19045. [2] Minh D T, Dung, N B. Hybrid algorithms in path planning for autonomous navigation of unmanned aerial vehicle: a comprehensive review[J]. Measurement Science and Technology, 2024, 35(11): 1-22. [3] 刘圣洋,宋婷,冯浩龙等. 无人机集群协同搜索研究综述[J]. 指挥控制与仿真, 2024, 46(1): 1-10. [4] 肖斌,侯悦. 军用微轻型无人机操控训练综述[J]. 舰船电子工程, 2024, 44(3): 1-3+8. [5] 金钰,谷全祥. 2023年国外军用无人机装备技术发展综述[J]. 战术导弹技术, 2024, 1(1): 33-47. [6] Zhang H, Wu S, Feng O, et al. Research on demand-based scheduling scheme of urban low-altitude logistics UAVs[J]. Applied Sciences, 2023, 13(9): 5370-5387. [7] 张天祚. 无人机在森林消防领域的应用概述[J]. 机械与电子控制工程. 2024, 6(24): 139-141. [8] 贾亮,林铭文,戚丽瑾等. 面向无人机航拍图像的多尺度目标检测研究[J]. 半导体光电, 2024, 45(3): 501-507+514. [9] 贾高伟,王建峰. 无人机集群任务规划方法研究综述[J]. 系统工程与电子技术, 2021, 43(1): 99-111. [10] 刘正元,吴元清,李艳洲等. 多无人机群任务规划和编队飞行的综述和展望[J]. 指挥与控制学报, 2023, 9(6): 623-636. [11] 邵壮. 多无人机编队路径规划与队形控制技术研究[D]. 陕西：西北工业大学, 2017. [12] Brandao A S, Sarcinelli-Filho M. On the guidance of multiple UAV using a centralized formation control scheme and delaunay triangulation[J]. Journal of Intelligent & Robotic Systems, 2016, 84(1): 397-413. [13] Zeng W, Chow M Y. A reputation-based secure distributed control methodology in D-NCS[J]. IEEE Transactions on Industrial Electronics, 2014, 61(11): 6294-6303. [14] Yuan L P. Decentralized formation flight control of multiple fixed-wing UAVs with a virtual leader[C]. Proceedings of the 31st Chinese Control Conference. IEEE, 2012: 6368-6375. [15] Lord R G, Brown D J, Freiberg S J. Understanding the dynamics of leadership: The role of follower self-concepts in the leader/follower relationship[J]. Organizational Behavior and Human Decision Processes, 1999, 78(3): 167-203. [16] Lewis M A, Tan K H. High precision formation control of mobile robots using virtual structures[J]. Autonomous Robots, 1997, 4(1): 387-403. [17] Balch T, Arkin R C. Behavior-based formation control for multirobot teams[J]. IEEE Transactions on Robotics and Automation, 1998, 14(6): 926-939. [18] Ren W. Consensus based formation control strategies for multi-vehicle systems[C]. 2006 American Control Conference. IEEE, 2006: 4237-4242. [19] 刘祖均,何明,马子玉等. 基于分布式一致性的无人机编队控制方法[J]. 计算机工程与应用, 2020, 56(23): 146-152. [20] 苟进展,梁天骄,陶呈纲等. 基于一致性理论的无人机编队控制与集结方法[J]. 北京航空航天大学学报, 2022, 50(5): 1646-1654. [21] 闫党辉,章卫国,陈航等. 具有时延和干扰约束的多无人机滑模一致性编队控制研究[J]. 西北工业大学学报. 2020, 38(2): 420-426. [22] 朱旭. 基于信息一致性的多无人机编队控制方法研究[D]. 陕西：西北工业大学, 2014. [23] 鲜斌,许鸣镝,王岭. 分布式无人机队列控制与动态障碍规避设计[J]. 控制与决策, 2022, 37(9): 2226-2234. [24] Chen Y, Yu J, Su X, et al. Path planning for multi-UAV formation[J]. Journal of Intelligent & Robotic Systems, 2015, 77(1): 229-246. [25] Sun J, Tang J, and S. Lao. Collision avoidance for cooperative UAVs with optimized artificial potential field algorithm[J]. IEEE Access, 2017, 5: 18382-18390. [26] Wen G, Chen C P, Liu Y J, et al. Neural network-based adaptive leader-following consensus control for a class of nonlinear multiagent state-delay systems[J]. IEEE transactions on cybernetics, 2016, 47(8): 2151-2160. [27] 赵启,甄子洋,龚华军等. 基于D3QN的无人机编队控制技术[J].北京航空航天大学学报, 2023, 49(8): 2137-2146. [28] Kuwata Y, How J P. Cooperative distributed robust trajectory optimization using receding horizon MILP[J]. IEEE Transactions on Control Systems Technology, 2010, 19(2): 423-431. [29] Earl M G, Andre R D. Iterative MILP methods for vehicle control problems[J]. IEEE Transactions on Robotics, 2005, 21(6): 1158-1167. [30] Jorris T R, Cobb R G. Three-dimensional trajectory optimization satisfying waypoint and no-fly zone constraints[J]. Journal of Guidance Control & Dynamics, 2008, 31(2): 543-553. [31] Kang M, Liu Y, Ren Y, et al. An empirical study on robustness of UAV path planning algorithms considering position uncertainty[C]. 2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE). IEEE, 2017: 1-6. [32] Jian Z, Zhang S, Chen S, et al. A global-local coupling two-stage path planning method for mobile robots[J]. IEEE Robotics And Automation Letters, 2021, 6(3): 5349-5356. [33] Jamjuntr P. Optimizing autonomous UAV navigation with D* algorithm for sustainable development[J]. Sustainability, 2024, 16(17): 7867-7883. [34] Zhang S, Sang H, Sun X, et al. A real-time local path planning algorithm for the wave glider based on time-stamped collision detection and improved artificial potential field[J]. Ocean Engineering, 2023, 283(2): 115139-115144. [35] Wang Z, Li G. Research on path planning algorithm of driverless ferry vehicles combining improved A* and DWA[J]. Sensors, 2024, 24(13): 4041-4054. [36] Rabani M R, Azhar I M, Fauziah S T, et al. Obstacle avoidance for a robotic navigation aid using fuzzy logic controller-optimal reciprocal collision avoidance[J]. Neural Computing and Applications, 2023, 35(30): 22405-22429. [37] Bai X, Fielbaum A, Kronmuller M, et al. Group-based distributed auction algorithms for multi-robot task assignment[J]. IEEE Transactions Automation Science And Engineering, 2022, 20(2): 1292-1303. [38] Bai X, Li C, Li T. Multi-robot task assignment for serving people quarantined in multiple hotels during COVID-19 pandemic[J]. Quantitative Imaging in Medicine and Surgery, 2023, 13(3): 1802-1813. [39] Pandit D, Zhang L, Chattopadhyay S, et al. A scattering and repulsive swarm intelligence algorithm for solving global optimization problems[J]. Knowledge-Based Systems, 2018, 156: 12-42. [40] Hamed S, Mojtaba V, Babak I. Optimal cooperative path planning of unmanned aerial vehicles by a parallel genetic algorithm[J]. Robotica, 2016, 34(4): 823-836. [41] Zhang Y, Yang, J, Chen, S. Decentralized cooperative trajectory planning for multiple UAVs in dynamic and uncertain environments[C]. 2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS). IEEE, 2015: 377-382. [42] Zhen Z, Gao C, Zhao Q, et al. Cooperative path planning for multiple UAVs formation[C]. The 4th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent. IEEE, 2014: 469-473. [43] Liu X, Zhang D, Zhang T, et al. A new path plan method based on hybrid algorithm of reinforcement learning and particle swarm optimization[J]. Engineering Computations, 2022, 39(3): 993-1019. [44] Hang C, Zhou X, Ran X, et al. Adaptive cylinder vector particle swarm optimization with differential evolution for UAV path planning[J]. Engineering Applications of Artificial Intelligence, 2023, 121: 105942. [45] 林韩熙,向丹,欧阳剑等. 移动机器人路径规划算法的研究综述[J]. 计算机工程与应用, 2021, 57(18): 38-48. [46] Pan Z, Zhang C, Xia Y, et al. An improved artificial potential field method for path planning and formation control of the multi-UAV systems[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2021, 69(3): 1129-1133. [47] Zhang G, Wang Y, He J, et al. A graph-based path planning method for additive manufacturing of continuous fiber-reinforced planar thin-walled cellular structures[J]. Rapid Prototyping Journal, 2023, 29(2): 344-353. [48] 赵畅,刘允刚,陈琳等. 面向元启发式算法的多无人机路径规划现状与展望[J]. 控制与决策, 2022, 37(5): 1102-1115. [49] 夏家伟,刘志坤,朱旭芳等. 基于多智能体强化学习的无人艇集群集结方法[J]. 北京航空航天大学学报, 2023, 49(12): 3365-3376. [50] 赵琳,吕科,郭靖等. 基于深度强化学习的无人机集群协同作战决策方法[J]. 计算机应用, 2023, 43(11): 3641-3646. [51] Silver D, Huang A, Maddison C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489. [52] Tian S, Li Y, Zhang X, et al. Fast UAV path planning in urban environments based on three-step experience buffer sampling DDPG[J]. Digital Communications and Networks, 2024, 10(4): 813-826. [53] Zhang Y, Wu Z, Ma Y, et al. Research on autonomous formation of multi-UAV based on MADDPG algorithm[C]. 2022 IEEE 17th International Conference on Control & Automation (ICCA). IEEE, 2022: 249-254. [54] Hu Z J, Gao X G, Wan K F, et al. Relevant experience learning: A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments[J]. Chinese Journal of Aeronautics, 2021, 34(12): 187-204. [55] Chen Y W, Chiu W Y. Optimal robot path planning system by using a neural network-based approach[C]. 2015 International Automatic Control Conference (CACS). IEEE, 2016: 85-90. [56] Lei X, Zhang Z, Dong P. Dynamic path planning of unknown environment based on deep reinforcement learning[J]. Journal of Robotics, 2018, 2018(1): 5781591. [57] Yan C, Xiang X, Wang C. Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments[J]. Journal of Intelligent & Robotic Systems, 2020, 98(2): 297-309. [58] Gao J, Ye W, Guo J, et al. Deep reinforcement learning for indoor mobile robot path planning[J]. Sensors, 2020, 20(19): 5493. [59] Wang L, Wang K, Pan C, et al. Multi-agent deep reinforcement learning-based trajectory planning for multi-UAV assisted mobile edge computing[J]. IEEE Transaction Cognitive Communications And Networking, 2021, 7(1): 73-84. [60] Wei K M, Huang K, Wu Y D, et al. High-performance UAV crowdsensing: A deep reinforcement learning approach[J]. IEEE Internet of Things Journal, 2022, 9(19): 18487-18499. [61] Singla A, Padakandla S, Bhatnagar S. Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(1): 107-118. [62] 文永明,李博研,张宁宁等. 基于深度强化学习的多智能体编队协同控制[J]. 指挥信息系统与技术, 2023, 14(5): 75-79. [63] Xie R, Meng Z, Wang L, et al. Unmanned aerial vehicle path planning algorithm based on deep reinforcement learning in large-scale and dynamic environments[J]. IEEE Access, 2021, 9: 24884-24900. [64] 游琪. 面向复杂优化问题的群体智能算法策略研究及其应用[D]. 江苏：江南大学, 2023. [65] 卢洋,贾立新,陈征. 面向无人机编队队形保持的航路规划技术研究[J]. 飞行力学, 2022, 40(4): 63-69+75. [66] 陈捷,党腾飞,王伟等. 基于改进势函数与动态约束的无人机编队重构方法[J]. 火控雷达技术, 2024, 53(2): 20-25. [67] 刘幸川,陈丹鹤,徐根等. 利用凸优化方法的多航天器编队重构轨迹规划[J]. 宇航学报, 2023, 44(6): 934-945. [68] Sannino A, Mungiguerra S, Cassese S, et al. Fast reconfiguration maneuvers of a micro-satellite constellation based on a hybrid rocket engine[J]. Aerotecnica Missili & Spazio, 2024, 103(4): 401-412. [69] 解晨晨. 多移动机器人编队及避障控制研究[D]. 陕西：西安理工大学, 2024. [70] Singh R, Ren J, Lin X. A review of deep reinforcement learning algorithms for mobile robot path planning[J]. Vehicles, 2023, 5(4): 1423-1451. [71] 李明阳,许可儿,宋志强等. 多智能体强化学习算法研究综述[J]. 计算机科学与探索, 2024, 18(8): 1979-1997. [72] 陈卓然,刘泽阳,万里鹏等. 多智能体强化学习理论及其应用综述[J]. 模式识别与人工智能, 2024, 37(10): 851-872. [73] 吴文君,王腾达,孙阳等. 多智能体路径规划技术研究综述[J]. 北京工业大学学报, 2024, 50(10): 1263-1272. [74] Xiang D, Lin H, Ouyang J, et al. Combined improved A* and greedy algorithm for path planning of multi-objective mobile robot[J]. Scientific Reports, 2022, 12(1): 1-12. [75] Chen Y, Mei Y, Yu J, et al. Three-dimensional unmanned aerial vehicle path planning using modified wolf pack search algorithm[J]. Neurocomputing, 2017, 266(8): 445-457. [76] Puente-Castro A, Rivero D, Pazos A, et al. A review of artificial intelligence applied to path planning in UAV swarms[J]. Neural Computing and Applications, 2022, 34(1): 153-170. [77] 闫少强, 杨萍, 刘卫东等. 基于GPSSA算法的复杂地形多无人机航迹规划[J]. 北京航空航天大学学报, 2023, 48(1): 1-17. [78] Li H, Lv T, Shui Y, et al. An improved grey wolf optimizer with weighting functions and its application to unmanned aerial vehicles path planning[J]. Computers and Electrical Engineering, 2023, 111(1): 1-16. [79] Xu C, Xu M, Yin C. Optimized multi-UAV cooperative path planning under the complex confrontation environment[J]. Computer Communications, 2020, 162(1): 196-203. [80] Chai X, Zheng Z, Xiao J, et al. Multi-strategy fusion differential evolution algorithm for UAV path planning in complex environment[J]. Aerospace Science and Technology, 2022, 121(1): 1-14. [81] Tao W, Huang G, Jia Y, et al. Three-Dimensional Collaborative Path Planning for Multi-UAVs Based on Improved GWO[C]. International Conference on Autonomous Unmanned Systems. Singapore: Springer Nature Singapore, 2022: 2487-2496. [82] Shao S, Peng Y, He C, et al. Efficient path planning for UAV formation via comprehensively improved particle swarm optimization[J]. ISA Transactions, 2020, 97: 415-430. [83] 马梓元, 龚华军, 王新华. 基于改进人工鱼群算法的无人直升机编队航迹规划[J]. 北京航空航天大学学报, 2021, 47(2): 406-413. [84] Mirjalili S M, Lewis A. Grey wolf optimizer[J]. Advances in Engineering Software, 2014, 69(1): 46-61. [85] Phung M D, Ha Q P. Safety-enhanced UAV path planning with spherical vector-based particle swarm optimization[J]. Applied Soft Computing, 2021, 107(6): 1-16. [86] Li J, Sun K, Ma H, et al. Moving agents in formation in congested environments[C]. Proceedings of the International Symposium on Combinatorial Search, 2020, 11(1): 726-734. [87] Han R, Chen S, Wang S, et al. Reinforcement learned distributed multi-robot navigation with reciprocal velocity obstacle shaped rewards[J]. IEEE Robotics and Automation Letters, 2022, 7(3): 5896-5903. [88] Novoseller E, Wei Y, Sui Y, et al. Dueling posterior sampling for preference-based reinforcement learning[C]. Conference on Uncertainty in Artificial Intelligence. PMLR, 2020: 1029-1038. [89] Peharz R, Vergari A, Stelzner K, et al. Random sum-product networks: A simple and effective approach to probabilistic deep learning[C]. Uncertainty in Artificial Intelligence. PMLR, 2020: 334-344. [90] Zhang Y, Zhang Z, Yang Q, et al. EV charging bidding by multi-DQN reinforcement learning in electricity auction market[J]. Neurocomputing, 2020, 397(8): 404-414. [91] Zhang S, Li Y, Dong Q. Autonomous navigation of UAV in multi-obstacle environments based on a deep reinforcement learning approach[J]. Applied Soft Computing, 2022, 115: 108194 [92] Yang Y, Li J, Peng L. Multi‐robot path planning based on a deep reinforcement learning DQN algorithm[J]. CAAI Transactions on Intelligence Technology, 2020, 5(3): 177-183. [93] Wan Y, Zhao Z, Tang J, et al. Multi-UAV formation obstacles avoidance path planning based on PPO algorithm[C]. 2023 9th International Conference on Big Data and Information Analytics. IEEE, 2023: 55-62. [94] 赖云晖,李瑞,史莹晶等. 基于图论法的四旋翼三角形结构编队控制[J]. 控制理论与应用, 2018, 35(10): 1530-1537. [95] 周从航,李建兴,石宇静等.深度强化学习在无人机编队路径规划中的应用[J]. 电光与控制, 2024, 31(10): 27-33. [96] Lowe R, Wu Y I, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]. Proceedings of Advances in Neural Information Processing Systems, 2017: 6382-6393. [97] 贾思雨,毕凌滔,曹扬等. 基于改进MADDPG的多机器人路径规划方法研究[J]. 计算机仿真, 2024, 41(8): 458-465. [98] 禹鑫燚, 杜丹枫, 欧林林. 不确定环境下的深度强化学习编队避障控制[J]. 高技术通讯, 2022, 32(8): 836-844. [99] 葛启兴,章伟,谢贵亮等.离线强化学习动态避障导航算法[J]. 上海工程技术大学学报, 2024, 38(3): 313-320. [100] Guo T, Nan J, Li B, et al. UAV navigation in high dynamic environments: A deep reinforcement learning approach[J]. Chinese Journal of Aeronautics, 2021, 34(2): 479-489. [101] Iqbal S, Sha F. Actor-attention-critic for multi-agent reinforcement learning[C]. International Conference on Machine Learning. PMLR, 2019: 2961-2970. [102] Fiorini P, Shiller Z. Motion planning in dynamic environments using velocity obstacles[J]. The International Journal of Robotics Research, 1998, 17(7): 760-772. ﹀
中图分类号：	TP391
馆藏号：	2025-003-0153
开放日期：	2025-09-25

附件下载