中文题名: | 基于强化学习的无人机避碰防撞技术研究 |
姓名: | |
学号: | SZ1903135 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 085210 |
学科名称: | 工学 - 工程 - 控制工程 |
学生类型: | 硕士 |
学位: | 工程硕士 |
入学年份: | 2019 |
学校: | 南京航空航天大学 |
院系: | |
专业: | |
研究方向: | 无人机避障 |
第一导师姓名: | |
第一导师单位: | |
完成日期: | 2022-03-24 |
答辩日期: | 2022-03-19 |
外文题名: |
Research on UAV Collision Avoidance Technology Based on Reinforcement Learning |
中文关键词: | |
外文关键词: | UAV obstacle avoidance ; deep reinforcement learning ; experience pool segmentation ; multi-agent ; relevant experience samping ; simulation system |
中文摘要: |
在侦察探测、电力巡检、物流配送等领域,实现无人机低空作业的关键是确保其自身的安全飞行。目前,无人机的智能自主避障是实现无人机无碰撞飞行的主流手段,因此本文以强化学习为基础对低空领域的无人机避障飞行决策问题进行了探索和研究,主要研究内容如下: 对基于强化学习的无人机避障进行建模分析。首先基于强化学习的避障特性建立无人机运动模型;然后针对山林环境特点建立动态障碍物和静态障碍物模型,并将其用数学方程进行描述;最后在环境中建立适用于无人机避碰防撞飞行任务的三维连续空间模型。 针对单无人机在不确定环境下的避障飞行任务,设计了一种基于经验池分割的双深度Q网络(S-DDQN)单无人机避障算法。首先根据避障过程样本的奖励值划分正负经验池,对采样训练的过程进行优化;其次,依据单无人机避障环境特性对算法的状态空间、动作空间和奖励函数进行设计;实现静态避障后,进一步考虑融合动态障碍物的避障环境,基于速度障碍法补充相应的状态和奖励。训练结果表明改进的算法具有更好的训练稳定性和更快的训练速度,测试结果表明在静态障碍物环境和融合动态障碍物的环境中相应的算法均能根据无人机状态决策无人机动作实现无碰撞飞行任务。 针对多无人机避碰防撞任务,设计了一种基于相关经验采样的多智能体深度确定性策略梯度(RES-MADDPG)多无人机避碰防撞算法。首先采用MADDPG算法解决多无人机强化学习中的训练不稳定问题;然后结合相关经验抽样方法给智能体产生的样本增加相关度标签,运行算法时根据状态标签先进行采样训练,再对无人机进行动作选择;同时针对多无人机避障环境特点设计相应的联合状态、动作空间和联合奖励函数,并针对是否固定分配目标两种任务情景对算法模型进行训练和测试。训练结果表明,改进算法先训练后策略选择的结构在提高训练速度和避障成功率上具有显著效果,测试结果表明设计的算法模型在固定分配目标和未固定分配目标两种任务情形下均能指导无人机完成多无人机避碰防撞飞行任务。 基于PyQt5建立了无人机避碰防撞任务飞行仿真系统,综合了单无人机和多无人机的避障算法,并对其完成了仿真实验,验证了软件单无人机和多无人机避碰防撞功能的可行性。 |
外文摘要: |
In the fields of reconnaissance detection, power inspection and logistics distribution, the key to realizing low-altitude operations of UAVs is to ensure their own safe flight. At present, intelligent autonomous obstacle avoidance of UAVs is the mainstream means to realize the collision-free flight of UAVs. Therefore, this paper explores and studies the decision-making problems of UAV obstacle avoidance flight in the low-altitude field based on reinforcement learning. The main research content is as follows: Modeling and analysis of UAV obstacle avoidance based on reinforcement learning. First, establish a UAV motion model based on the obstacle avoidance characteristics of reinforcement learning, then establish dynamic obstacles and static obstacle models based on the characteristics of the mountain forest environment, and describe them with mathematical equations. Finally, a three-dimensional continuous space model suitable for UAV collision avoidance and collision avoidance mission is established in the environment. For the obstacle avoidance mission of a single UAV in an uncertain environment, the single UAV obstacle avoidance algorithm based on S-DDQN is designed. Firstly, the positive and negative experience pool is divided according to the reward value of the obstacle avoidance process sample, and the sampling training process is optimized. On this basis, the state space, action space and the reward function of the algorithm model are designed according to the characteristics of the single UAV obstacle avoidance environment. After realizing the static obstacle avoidance of a single UAV, further consider the obstacle avoidance environment integrating dynamic obstacles, and supplement the corresponding status and rewards based on the speed obstacle method design. The training results show that the improved S-DDQN algorithm has better training stability and faster training speed. The test results show that the corresponding algorithm can make decisions based on the status of the UAV in a static obstacle environment and an environment where dynamic obstacles are integrated. The UAV action realizes the collision-free flight mission. For the obstacle avoidance mission of multi-UAVs in an uncertain environment, the multi-UAVs collision avoidance and collision avoidance algorithm based on RES-MADDPG is designed. Firstly, the MADDPG algorithm is used to solve the environmental instability problem in multi-UAVs reinforcement learning, and then combined with relevant experience sampling methods to add correlation labels to the samples generated by self-extraction, and the algorithm is based on The status label performs sampling training first, and then selects the actions of the UAVs. Simultaneously, the corresponding joint state, action space and joint reward function are designed according to the characteristics of the multi-UAVs obstacle avoidance environment, and the algorithm model is trained and tested for the two task scenarios of whether to allocate the target in advance. The training results show that the structure of the improved RES-MADDPG algorithm after first training and strategy selection has significant effects on training speed and obstacle avoidance success rate. The test results show that the designed algorithm model assigns targets in advance and does not assign targets in advance. Under these circumstances, it can be known that the UAV has completed the multi-UAVs joint collision avoidance and collision avoidance missions. Based on PyQt5, a flight simulation system for UAV collision avoidance and collision avoidance missions is established, which integrates obstacle avoidance algorithms of single UAV and multi-UAVs, and completes simulation experiments on it, verifying the software single UAV and the feasibility of multi-UAVs collision avoidance and collision avoidance functions. |
参考文献: |
[2] 甄子洋,江驹,孙绍山,等.无人机集群作战协同控制与决策[M].北京:国防工业出版社,2022:1. [6] 陈亚青,张智豪,李哲.无人机避障方法研究进展[J].自动化技术与应用,2020,39(12):1-6. [18] 胡美富,宁芊,陈炳才,等. RWPSO与马尔科夫链的无人机航路规划[J].哈尔滨工业大学学报,2019,51(11):75-81. [21] 陈香敏,吴莹.基于Voronoi图的UAV攻击多移动目标的路径规划算法研究[J].信息通信,2020(06):36-37. [22] 吕太之,周武,赵春霞.采用粒子群优化和B样条曲线的改进可视图路径规划算法[J].华侨大学学报:自然科学版,2018,39(01):103-108. [32] Watkins C J C H, Dayan P. Q-Learning[J]. Machine Learning, 1992, 3(8): 279-292. [41] 符小卫,王辉,徐哲.基于DE-MADDPG的多无人机协同追捕策略研究[J/OL].航空学报:1-16. [52] 孙彧, 曹雷, 陈希亮,等. 多智能体深度强化学习研究综述[J]. 计算机工程与应用, 2020, 56(05):13-24. [54] 何金, 丁勇, 高振龙. 基于 Double Deep Q Network 的无人机隐蔽接敌策略[J]. 电光与控制, 2020, 27(7): 52-57. [57] 李樾, 韩维, 陈清阳, 等. 基于改进的速度障碍法的有人/无人机协同系统三维实时避障方法[J]. 西北工业大学学报, 2020, 38(2): 309-318. |
中图分类号: | V249 |
馆藏号: | 2022-003-0062 |
开放日期: | 2022-09-24 |