中文题名: | 基于强化学习的无人机编队控制技术 |
姓名: | |
学号: | SX1903134 |
保密级别: | 公开 |
论文语种: | chi |
学科代码: | 081101 |
学科名称: | 工学 - 控制科学与工程 - 控制理论与控制工程 |
学生类型: | 硕士 |
学位: | 工学硕士 |
入学年份: | 2019 |
学校: | 南京航空航天大学 |
院系: | |
专业: | |
研究方向: | 无人机编队控制 |
第一导师姓名: | |
第一导师单位: | |
第二导师姓名: | |
完成日期: | 2022-03-24 |
答辩日期: | 2022-03-19 |
外文题名: |
Formation Control Technology of UAV Based on Reinforcement Learning |
中文关键词: | |
外文关键词: | Unmanned Aerial Vehicle ; Formation Control ; Reinforcement Learning ; Deep Q-network ; Proximal Policy Optimization |
中文摘要: |
无人机编队凭借其探索范围广,任务完成率高等优点,在军事与生活上得到了广泛的应用,无人机编队控制也为了一个研究热点。强化学习作为人工智能的一个分支,在近年来得到了广泛关注。本文针对无人机编队智能化程度低,无人机自学习能力不足等问题,将强化学习技术应用到无人机编队控制问题中,对无人机编控制进行了研究。主要内容有: 首先,根据强化学习基本原理,将无人机编队问题转化为马尔科夫决策模型并设计相关要素,设计的状态空间、动作空间和奖励函数等能够移植到今后的研究中。同时,在长机-僚机结构下,采用基于价值函数的竞争Q网络算法,结合提出的优先策略和分层动作库方法,实现了无人机编队侧向距离控制,并通过仿真验证了设计控制器的有效性。 然后,将编队控制问题扩展到二维队形控制,采用基于策略的近端策略优化算法设计编队控制器,并结合了泛化优势估计对原始算法进行改进。为在不影响学习效果的前提下减少状态空间维度,将状态空间结合积分补偿的方法,有效缩小了状态空间维度。通过仿真验证了设计控制器的有效性。 最后,上述的两种方法均是基于无模型的强化学习算法,虽然最终能够实现编队控制效果,但是仍旧需要大量时间训练学习。为解决这个问题,采用神经网络拟合动力学模型,利用模型强化学习算法解决编队控制问题,并结合模型预测控制思想优化动作序列。最终结果显示相比无模型的强化学习方法,基于模型的强化学习方法能够大大缩短训练学习时间,验证了控制器的有效性。 通过将强化学习方法引入到编队控制问题中,能够使僚机通过训练学习到跟踪长机的最佳策略并保持到期望队形距离。本论文对强化学习应用在无人机编队控制问题上进行了有益的探索,具有较好的理论意义与实际应用场景。 |
外文摘要: |
With its advantages of wide exploration range and high mission completion rate, UAV formation has been widely used in military and life. UAV formation control has also become a research hotspot. Reinforcement learning, as a branch of artificial intelligence, has become a research hotspot in recent years. At present, the intelligence degree of UAV formation is still low, so the method of reinforcement learning is adopted to design the formation controller. In this paper, aiming at the problems of UAV formation control with low intelligence degree and insufficient self-learning ability, the reinforcement learning technology is applied to UAV formation control, and the formation maintenance of UAV is studied. The main contents are: Firstly, according to the principle of reinforcement learning, the uav formation problem is transformed into Markov decision model and related elements are designed, and the designed state space, action space and reward function can be transplanted into future studies. At the same time, the formation lateral distance control of UAV is realized by using the Dueling Double Deep Q-Network algorithm based on value function, combining the proposed priority strategy and layered action library method. The effectiveness of the proposed controller is verified by simulation. Then, the formation control problem is extended to two-dimensional formation control, and the formation controller is designed by using Proximal Policy Optimization algorithm, and the original algorithm is improved by using generalized advantage estimation. In order to reduce the dimension of state space without affecting the learning effect, the state space is combined with the integral compensation method to effectively reduce the dimension of state space. Finally, The effectiveness of the proposed controller is verified by simulation. Finally, both of the above two methods are model-free reinforcement learning algorithms. Although formation control can be achieved in the end, it still needs a lot of time to train and learn. In order to learn the strategy quickly, the neural network fitting dynamics model is adopted, the model reinforcement learning algorithm is used to solve the formation control problem, and the model predictive control idea is combined to optimize the action sequence. The final results show that compared with the model-free reinforcement learning method, the model-based reinforcement learning algorithm can reduce the training time effectively and achieve the formation control effect. By introducing reinforcement learning into formation control, the follower can learn the best strategy to track the leader and keep the desired formation distance through training. In this paper, the application of reinforcement learning in UAV formation control is explored, which has good theoretical significance and practical application scenarios. |
参考文献: |
[1]昂海松, 曾建江, 童明波. 现代航空工程[M]. 北京:国防工业出版社,2012:11. [3]甄子洋, 江驹, 孙绍山, 等. 无人机集群作战协同控制与决策[M]. 北京:国防工业出版社, 2022:1. [5]贾高伟, 侯中喜, JIA. 美军无人机集群项目发展[J]. 国防科技, 2017, 38(305):58-61. [6]刘丽, 王森, 胡然. 美军主要无人机集群项目发展浅析[J].飞航导弹, 2018,(7):37-43. [14]邵壮, 祝小平, 周洲, 等. 无人机编队机动飞行时的队形保持反馈控制[J]. 西北工业大学学报, 2015,33(1):26-32. [17]宋运忠, 杨飞飞. 基于行为法多智能体系统构形控制研究[J]. 控制工程, 2012,19(4):687-690. [21]甄子洋, 龚华军, 陶钢, 等. 基于自适应控制的大型客机编队飞行一致性控制[J]. 中国科学:技术科学,2018,48(3):11. [22]朱旭, 张逊逊, 润茂德, 张昌利. 基于一致性的无人机编队控制策略[J]. 计算机仿真,2016,33(8):30-34. [23]陈炎财. 群体无人机分布式协同控制方法研究[D]. 南京:南京航空航天大学,2011. [24]文梁栋, 甄子洋, 龚华军. 基于一致性的有限区域内紧密编队集结控制[J]. 电光与控制, 2020,v.27;No.269(11):72-78+109. [28]李一波, 王文, 陈伟, 等. 无人机编队保持与变换的滑模控制器设计[J]. 控制工程, 2016,23(2):273-278. [42]黄旭, 柳嘉润, 贾晨辉, 等. 深度确定性策略梯度算法用于无人飞行器控制[J]. 航空学报,2021,42(X):524688. [49]相晓嘉, 闫超. 王菖, 等. 基于深度强化学习的固定翼无人机编队协调控制方法[J]. 航空学报, 2020,1-14. [50]任坚, 刘剑慰, 杨蒲. 基于增量式策略强化学习算法的飞行控制系统的容错跟踪控制[J]. 控制理论与应用,2020,37(7):1429-1438. [51]张友安, 马国欣, 刘京茂, 等. 固定翼无人机强化学习控制建模与算法设计[J]. 飞行力学, 2019, 37(04):88-91. [54]Watkins C, Dayan P. Q-Learing[J]. Machine Learning, 1992,8(3-4):279-292. [63]席裕庚, 李德伟, 林姝. 模型预测控制——现状与挑战[J].自动化学报,2013,39(03):222-236. |
中图分类号: | V249 |
馆藏号: | 2022-003-0069 |
开放日期: | 2022-09-24 |