查看论文信息

查看全文

免费浏览

查看论文信息

中文题名：	基于数字孪生的无人机集群智能协同理论与技术研究
姓名：	沈高青
学号：	BX1904002
保密级别：	公开
论文语种：	chi
学科代码：	081001
学科名称：	工学 - 信息与通信工程 - 通信与信息系统
学生类型：	博士
学位：	工学博士
入学年份：	2019
学校：	南京航空航天大学
院系：	电子信息工程学院/集成电路学院
专业：	信息与通信工程
研究方向：	飞行器智能组网与协同
第一导师姓名：	雷磊
第一导师单位：	电子信息工程学院/集成电路学院
完成日期：	2023-11-01
答辩日期：	2023-12-15
外文题名：	Research on the Theory and Technology of Intelligent Cooperation of UAV Swarms Based on Digital Twins
中文关键词：	无人机集群 ; 协同控制 ; 人工智能 ; 深度强化学习 ; 数字孪生
外文关键词：	UAV swarms ; cooperative control ; artificial intelligence ; deep reinforcement learning ; digital twins
中文摘要：	︿近年来，由于无人机集群在军事和民用领域的广泛应用，基于预编队、自适应和人为决策的集群协同控制方法已无法满足日益复杂的任务需求，传统的程序化协同控制必然被智能化协同控制方法所替代。以深度强化学习为代表的人工智能技术快速发展，为实现集群智能协同提供了新思路。但是，如何以高保真的方式提高决策模型训练速度一直是制约深度强化学习在集群智能协同控制中应用的关键瓶颈。为了突破这一瓶颈，本文提出了一种基于数字孪生的深度强化学习决策模型训练方法，并针对集群协同航迹规划、协同目标搜索和协同电子干扰三种典型应用，分别提出了不同的深度强化学习协同决策模型，为无人机集群智能协同体系的构建提供理论和技术支撑。本文的主要工作与贡献如下：（1）针对深度强化学习决策模型训练难的问题，提出了基于数字孪生的集群协同强化学习决策模型训练方法。首先，提出了基于数字孪生的智能无人机集群虚实结合仿真框架。该框架由物理实体、孪生模型、决策模型和数字孪生仿真中间件四部分组成，为深度强化学习决策模型的训练和验证提供高效的仿真支撑。然后，在集群数字孪生仿真环境构建完成的基础上，提出了深度强化学习决策模型“孪生式训练、分布式决策、持续进化”方法，通过建立多个可并行训练的孪生环境副本提高样本数据采集效率，支持集群协同任务执行过程中决策模型的自主进化。仿真和实测结果表明，本文提出的基于数字孪生的深度强化学习决策模型训练方法能够有效支撑决策模型训练，提高决策模型的训练速度和迁移能力。（2）针对无人机集群协同航迹规划问题，提出了一种基于行为耦合的深度确定性策略梯度（Behavior Coupling-Deep Deterministic Policy Gradient, BCDDPG）深度强化学习算法。受自然界生物群集行为的启发，BCDDPG算法使用先分解后耦合的多子策略网络架构，有助于策略网络理解环境状态信息，生成更高质量的集群协同行为。同时，BCDDPG算法在其子策略网络中使用了长短期记忆（Long Short-Term Memory，LSTM）神经网络，提高智能体对历史环境信息的理解能力，解决集群航迹规划问题的部分可观测性问题，提高模型的收敛速度。仿真结果表明，在BCDDPG算法的驱动下，无人机集群能够以自主协同的方式在避开障碍物的同时成功到达目标点。对比实验结果表明，BCDDPG算法在平均到达率、平均到达时间、平均碰撞率等多项指标上均优于现有算法。（3）针对无人机集群协同目标搜索问题，提出了一种基于值函数分解架构的集群协同目标搜索深度强化学习决策方法。首先，提出了一种基于线性计算的邻居节点探测信息融合机制，保证探测信息有效性的同时极大地降低了通信数据量和决策计算量，提高了决策效率。然后，将信息概率图和多智能体深度强化学习相结合，以图形表征法为基础，构建了规模可扩展的智能体状态空间，综合考虑目标搜索率和区域覆盖率两个指标设计智能体奖励函数，引导无人机自主搜索目标。最后，提出了一种基于深度噪声网络的值函数分解（Deep Noisy QMIX, DNQMIX）算法，将噪声网络引入QMIX算法的策略网络中，大大提高了智能体的探索能力。仿真结果表明，DNQMIX算法的收敛性能优于现有深度强化学习算法，并且在目标搜索率和区域覆盖率方面的表现优于群智能优化算法。（4）针对无人机集群协同电子干扰问题，提出了一种基于策略梯度的集群协同电子干扰深度强化学习决策方法。首先，对多功能雷达系统进行建模，研究了4种有源干扰样式的干扰机理，分析了不同干扰样式对雷达发现概率的影响。然后，针对协同电子干扰问题，综合考虑了干扰对象、干扰样式和干扰功率的联合优化问题，设计了智能体的动作空间、状态空间和奖励函数，实现干扰效能最大化。最后，提出了一种自适应学习率近端策略优化（Adaptive Learning Rate Proximal Policy Optimization, APPO）算法，提高决策模型的收敛速度和收敛性能。仿真结果表明，相较于现有的深度强化学习算法，APPO算法能够有效降低敌方雷达的发现概率，提升突防成功率。本文的研究成果有望突破制约深度强化学习在无人机集群协同控制领域中应用的关键瓶颈，推动集群智能协同控制的实现，提升集群协同的智能化水平，为未来无人智能协同体系的构建提供新概念、新理论和新技术。﹀
外文摘要：	︿ In recent years, due to the widespread application of unmanned aerial vehicle (UAV) swarms in military and civilian fields, the cooperative control methods of UAV swarms based on pre-formation, adaptability, and human decision-making can no longer meet the increasingly complex task requirements. Traditional procedural cooperative control methods are inevitably being replaced by intelligent coordination control methods. With the rapid development of artificial intelligence technologies such as deep reinforcement learning (DRL), new approaches have been provided for achieving intelligent cooperative control of UAV swarms. However, the key bottleneck in applying deep reinforcement learning to intelligent cooperative control of UAV swarms has been how to improve the training speed of decision models in a high-fidelity manner. To address this bottleneck, this dissertation proposes a DRL decision model training method based on digital twins (DT). It further proposes different DRL-based cooperative decision models for three typical applications: cooperative trajectory planning, cooperative target search, and cooperative electronic jamming. This dissertation provides theoretical and technical support for the construction of an intelligent cooperative system for UAV swarms. The main work and contributions of this dissertation are as follows: (1) Addressing the difficulty in training DRL decision models, a method DT-based training method for DRL decision models of UAV swarms is proposed. First, a DT-based simulation framework containing physical entities, twin models, decision models, and digital twin simulation middleware is proposed. This framework can efficiently support the training and validation of DRL decision models. Then, on the basis of the completion of the construction of the digital twin simulation environment for UAV swarms, a “twin training, distributed execution, and continuous evolution” method for DRL decision models is proposed. This method enhances the efficiency of sample data collection by building multiple twin environment replicas that can be trained in parallel, and supports the autonomous evolution of decision models during the execution of cooperative tasks. Simulation and experimental results indicate that the proposed DT-based DRL decision model training method effectively supports decision model training, improving decision model training speed and transferability. (2) For the problem of cooperative trajectory planning in UAV swarms, a behavior coupling-deep deterministic policy gradient (BCDDPG) DRL algorithm is proposed. Inspired by the collective behavior of biological groups in nature, the BCDDPG algorithm uses a decomposed first and coupled then multi-sub-policy network architecture to help the policy network understand environmental state information and generate higher-quality cluster-coordinated behavior. Additionally, the BCDDPG algorithm employs Long Short-Term Memory (LSTM) networks in its sub-policy networks to enhance the agent's understanding of historical environmental information, addressing the partial observability problem in trajectory planning and improving model convergence speed. Simulation results show that under the guidance of the BCDDPG algorithm, UAV swarms can autonomously coordinate to reach the target successfully while avoiding obstacles. Comparative experimental results demonstrate that the BCDDPG algorithm outperforms existing algorithms in terms of average arrival rate, average arrival time, and average collision rate. (3) For the cooperative target search problem in UAV swarms, a DRL decision method based on the value function decomposition architecture is proposed. Firstly, a linear computation-based neighbor node detection information fusion mechanism is proposed to ensure the effectiveness of detection information while significantly reducing communication data and decision calculation, thereby improving decision efficiency. Next, combining information probability graphs with multi-agent deep reinforcement learning (MADRL), a scalable state space is constructed based on the graph representation method. The reward function is designed taking into account both target search rate and region coverage rate, guiding drones to autonomously search for targets. Finally, a value function decomposition algorithm based on deep noisy networks (DNQMIX) is proposed, introducing noise networks into the policy network of the QMIX algorithm to significantly enhance the exploration capability of agents. Simulation results indicate that the convergence performance of the DNQMIX algorithm is superior to existing DRL algorithms, and its performance in target search rate and region coverage rate surpasses that of swarm intelligence optimization algorithms. (4) For the cooperative electronic jamming problem in UAV swarms, a policy gradient-based DRL decision method is proposed. Firstly, a model of a multifunctional radar system is created, and the jamming mechanisms of four active jamming patterns are studied, analyzing the impact of different jamming patterns on radar discovery probability. Then, for the problem of cooperative electronic jamming, the action space, state space and reward function of the agent are designed by comprehensively considering the joint optimization problem of jamming objects, jamming patterns and jamming power to maximize jamming effectiveness. Finally, an adaptive learning rate proximal policy optimization (APPO) DRL algorithm is proposed to improve the convergence speed and convergence performance of the decision model. Simulation results show that compared to existing DRL algorithms, the APPO algorithm effectively reduces the discovery probability of enemy radars and enhances the success rate of breakthroughs. The research results of this dissertation are expected to break through the key bottlenecks that restrict the application of DRL in the field of UAV swarm cooperative control, promote the realization of swarm intelligent cooperative control, improve the intelligence level of swarm cooperation, and provide new concepts, new theories and new technologies for the construction of future unmanned intelligent cooperative systems. ﹀
参考文献：	︿ [1] Fan B, Li Y, Zhang R, et al. Review on the technological development and application of UAV systems[J]. Chinese Journal of Electronics, 2020, 29(2): 199-207. [2] Shakhatreh H, Sawalmeh A H, Al-Fuqaha A, et al. Unmanned aerial vehicles (UAVs): A survey on civil applications and key research challenges[J]. IEEE Access, 2019, 7: 48572-48634. [3] Chung S J, Paranjape A A, Dames P, et al. A survey on aerial swarm robotics[J]. IEEE Transactions on Robotics, 2018, 34(4): 837-855. [4] Zhou Y, Rao B, Wang W. UAV swarm intelligence: Recent advances and future trends[J]. IEEE Access, 2020, 8: 183856-183878. [5] Zhu X, Liu Z, Yang J. Model of collaborative UAV swarm toward coordination and control mechanisms study[J]. Procedia Computer Science, 2015, 51: 493-502. [6] Xie Y, Han L, Dong X, et al. Bio-inspired adaptive formation tracking control for swarm systems with application to UAV swarm systems[J]. Neurocomputing, 2021, 453: 272-285. [7] DARPA. Strategic Technology Office Outlines Vision for Mosaic Warfare[R]. 2017-8-4, https://www.darpa.mil/news-events/2017-08-04. [8] Center for Strategic and Budgetary Assessments. Mosaic Warfare: Exploiting Artificial Intelligence and Autonomous Systems to Implement Decision-Centric Operations[R]. 2020-2-11, https://csbaonline.org/uploads/documents/Mosaic_Warfare_Web.pdf. [9] 杨宇龙, 王小平, 林秦颖, 等. 多UAV路径跟踪协同编队机动指令决策算法[J]. 飞行力学, 2015, 33(05): 471-475. [10] 周绍磊, 康宇航, 秦亮, 等. 多无人机协同控制的研究现状与主要挑战[J]. 飞航导弹, 2015, (7): 31-35. [11] Kamel M A, Ghamry K A, Zhang Y. Real-time fault-tolerant cooperative control of multiple UAVs-UGVs in the presence of actuator faults[J]. Journal of Intelligent & Robotic Systems, 2017, 88: 469-480. [12] 刘剑豪, 汤辛, 魏光辉, 等. 无人机蜂群对未来战争的影响及应对措施分析[J]. 飞航导弹, 2020(09): 28-31. [13] 焦士俊, 刘锐, 刘剑豪, 等. 无人机蜂群威胁效能评估[J]. 舰船电子对抗, 2019, 42(03): 17-20. [14] 李五洲, 胡雷刚, 王峰. 美军直升机与无人机蜂群协同作战使用分析[J]. 军事文摘, 2020, 7: 29-32. [15] 刘丽君, 涂天佳. 舰载有人直升机/无人机典型协同作战样式分析[J]. 电子技术与软件工程, 2018, 13: 96-98. [16] Fahey K, Miller M. Unmanned Systems Integrated Roadmap 2017-2042[R]. Department of Defense, 2017. [17] 蔡明春, 吕寿坤. 智能化战争形态及其支撑技术体系[J]. 国防科技, 2017, 38(1): 94-98. [18] Ang K Z Y, Dong X, Liu W, et al. High-precision multi-UAV teaming for the first outdoor night show in Singapore[J]. Unmanned Systems, 2018, 6(01): 39-65. [19] Yang S, Yang X, Mo J. The application of unmanned aircraft systems to plant protection in China[J]. Precision agriculture, 2018, 19: 278-292. [20] Jung S, Kim H. Analysis of amazon prime air UAV delivery service[J]. Journal of Knowledge Information Technology and Systems, 2017, 12(2): 253-266. [21] Wang B, Sun Y, Liu D, et al. Social-aware UAV-assisted mobile crowd sensing in stochastic and dynamic environments for disaster relief networks[J]. IEEE Transactions on Vehicular Technology, 2019, 69(1): 1070-1074. [22] Zhang J, Yan J, Zhang P. Multi-UAV formation control based on a novel back-stepping approach[J]. IEEE Transactions on Vehicular Technology, 2020, 69(3): 2437-2448. [23] Ruan W, Duan H. Multi-UAV obstacle avoidance control via multi-objective social learning pigeon-inspired optimization[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(5): 740-748. [24] Ding R, Gao F, Shen X S. 3D UAV trajectory design and frequency band allocation for energy-efficient and fair communication: A deep reinforcement learning approach[J]. IEEE Transactions on Wireless Communications, 2020, 19(12): 7796-7809. [25] Chen D, Qi Q, Zhuang Z, et al. Mean field deep reinforcement learning for fair and efficient UAV control[J]. IEEE Internet of Things Journal, 2020, 8(2): 813-828. [26] Shakeri R, Al-Garadi M A, Badawy A, et al. Design challenges of multi-UAV systems in cyber-physical applications: A comprehensive survey and future directions[J]. IEEE Communications Surveys & Tutorials, 2019, 21(4): 3340-3385. [27] Peterson A L, Kuck K F. Airborne Manned Unmanned System Technology(AMUST) program[C]//AHS International Annual Forum, 55 th, Montreal, Canada. 1999: 827-836. [28] Kearns K. RFI: autonomy for loyal wingman[R]. Air Force Research Laboratory (AFRL), 2015, 50. [29] 高遐, 熊健. 有人机/无人机协同概念及相关技术[J]. 电讯技术, 2014, 54(12): 1612-1616. [30] 美国XQ-58A “女武神”无人隐身战机成功首飞[J]. 航空模型, 2019(4):91-91. [31] 成高帅. 澳“忠诚僚机”完成首飞[N/OL]. 中国国防报, 2021-3-9, http://military.people.com.cn/n1/2021/0309/c1011-32046630.html. [32] Intern2. 美国空军“天空博格人”和“金帐汗国”项目将转入正式采办序列[N/OL]. 全球航空资讯, 2022-11-17, http://www.cannews.com.cn/2022/1117/353314.shtml. [33] 李洪兴. ONR开发无人机蜂群技术[J]. 现代军事, 2015, (07): 25. [34] 袁成. 美国国防高级研究计划局“小精灵”项目[J]. 兵器知识, 2016, (9): 37-39. [35] 罗德林, 徐扬, 张金鹏. 无人机蜂群对抗技术新进展[J]. 科技导报, 2017, 35(7): 26-31. [36] 王彤, 李磊, 蒋琪. “进攻性蜂群使能战术”项目推进无人蜂群能力发展分析[J]. 战术导弹技术, 2020, (1): 33-38. [37] 焦士俊, 王冰切, 刘剑豪, 等. 国内外无人机蜂群研究现状综述[J]. 航天电子对抗, 2019, 035(001):61-64. [38] 韩杨楠冰. DARPA“小精灵”项目增加第四阶段工作[EB/OL].中国航空新闻网, 2020-9-11, http://www.cannews.com.cn/2020/09/11/99311144.html. [39] 张旭东, 吴利荣, 肖和业, 等. 由美军作战概念出发的有人机/无人机智能协同作战解析[J]. 无人系统技术, 2020(4). [40] 王金志. “史上最大规模！”美陆军将测试交互式无人机蜂群，共30架. 新华网, 2022-4-27, http://www.xinhuanet.com/mil/2022-04/27/c_1211641414.htm. [41] Carberry S. SPECIAL REPORT: Unmanned Systems Make a Splash During RIMPAC[J]. National Defense, 2022. [42] 刘箴, 吴馨远, 许洁心. 无人机集群作战系统的新发展及趋势分析[J/OL]. 弹箭与制导学报:1-19[2023-02-04]. http://kns.cnki.net/kcms/detail/61.1234.TJ.20220916.1654.004.html. [43] 段海滨, 申燕凯, 王寅, 等. 2018年无人机领域热点评述[J]. 科技导报, 2019, 37(3): 82-90. [44] 我国首个实用化无人机蜂群曝光最多可出动200架察打一体威力强[EB/OL]. 无人机网, 2020-10-20, https://www.youuav.com/news/detail/202010/45513.html. [45] 柳强, 何明, 刘锦涛, 等. 无人机“蜂群”的蜂拥涌现行为识别与抑制机理[J]. 电子学报, 2018, 47(2): 374-381. [46] 刘慧霞, 席庆彪, 李大健, 等. 电子战无人机协同作战关键技术发展现状[J]. 火力与指挥控制, 2013, 38(9): 5-8. [47] 樊洁茹, 李东光. 有人机/无人机协同作战研究现状及关键技术浅析[J]. 无人系统技术, 2019, (1): 39-47. [48] 钟赟, 姚佩阳, 张杰勇, 等. 有人/无人机协同作战系统C2结构和行动计划适应性设计方法[J]. 空军工程大学学报, 2019, 20(3): 38-45. [49] 吴雪松, 杨新民. 无人机蜂群C2智能系统初探[J]. 中国电子科学研究院学报, 2018, 13(5): 515-519. [50] 陈少飞. 无人机蜂群系统侦察监视任务规划方法[D]. 博士学位论文, 国防科学技术大学, 2016. [51] Roberge V, Tarbouchi M, Labonté G. Fast genetic algorithm path planner for fixed-wing military UAV using GPU[J]. IEEE Transactions on Aerospace and Electronic Systems, 2018, 54(5): 2105-2117. [52] Mousavi S, Afghah F, Ashdown J D, et al. Use of a quantum genetic algorithm for coalition formation in large-scale UAV networks[J]. Ad Hoc Networks, 2019, 87: 26-36. [53] Du W, Ying W, Yang P, et al. Network-based heterogeneous particle swarm optimization and its application in UAV communication coverage[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2019, 4(3): 312-323. [54] Salamat B, Tonello A M. Stochastic trajectory generation using particle swarm optimization for quadrotor unmanned aerial vehicles (UAVs)[J]. Aerospace, 2017, 4(2): 27. [55] Shen Y. Bionic Communication Network and Binary Pigeon-Inspired Optimization for Multiagent Cooperative Task Allocation[J]. IEEE Transactions on Aerospace and Electronic Systems, 2022, 58(5): 3946-3961. [56] Duan H, Zhao J, Deng Y, et al. Dynamic discrete pigeon-inspired optimization for multi-UAV cooperative search-attack mission planning[J]. IEEE Transactions on Aerospace and Electronic Systems, 2020, 57(1): 706-720. [57] Radmanesh M, Kumar M, Sarim M. Grey wolf optimization based sense and avoid algorithm in a Bayesian framework for multiple UAV path planning in an uncertain environment[J]. Aerospace Science and Technology, 2018, 77: 168-179. [58] Xu C, Xu M, Yin C. Optimized multi-UAV cooperative path planning under the complex confrontation environment[J]. Computer Communications, 2020, 162: 196-203. [59] 吴傲, 杨任农, 梁晓龙, 等. 基于信息素决策的无人机集群协同搜索算法[J]. 北京航空航天大学学报, 2021, 47(4): 814-827. [60] 刘重, 高晓光, 符小卫. 带信息素回访机制的多无人机分布式协同目标搜索[J]. 系统工程与电子技术, 2017, 39(9): 1998-2011. [61] Zhao H, Mao L, Wei J. Coverage on demand: A simple motion control algorithm for autonomous robotic sensor networks[J]. Computer Networks, 2018, 135: 190-200. [62] Zhao H, Liu H, Leung Y W, et al. Self-adaptive collective motion of swarm robots[J]. IEEE Transactions on Automation Science and Engineering, 2018, 15(4): 1533-1545. [63] Fu X, Feng P, Gao X. Swarm UAVs task and resource dynamic assignment algorithm based on task sequence mechanism[J]. IEEE Access, 2019, 7: 41090-41100. [64] Zhang Y, Feng W, Shi G, et al. UAV swarm mission planning in dynamic environment using consensus-based bundle algorithm[J]. Sensors, 2020, 20(8): 2307. [65] Silver D, Huang A, Maddison C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. nature, 2016, 529(7587): 484-489. [66] Bayerlein H, Theile M, Caccamo M, et al. Multi-UAV path planning for wireless data harvesting with deep reinforcement learning[J]. IEEE Open Journal of the Communications Society, 2021, 2: 1171-1187. [67] Qie H, Shi D, Shen T, et al. Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning[J]. IEEE access, 2019, 7: 146264-146272. [68] Cruz D L, Yu W. Path planning of multi-agent systems in unknown environment with neural kernel smoothing and reinforcement learning[J]. Neurocomputing, 2017, 233: 34-42. [69] Biswas S. ChatGPT and the future of medical writing[J]. Radiology, 2023, 307(2): 223312. [70] Bachelor G, Brusa E, Ferretto D, et al. Model-based design of complex aeronautical systems through digital twin and thread concepts[J]. IEEE Systems Journal, 2019, 14(2): 1568-1579. [71] Grieves M W. Product lifecycle management: the new paradigm for enterprises[J]. International Journal of Product Development, 2005, 2(1-2): 71-84. [72] Glaessgen E, Stargel D. The digital twin paradigm for future NASA and US Air Force vehicles. 53rd Structures[C]//Structural Dynamics, and Materials Conference: Special Session on the Digital Twin. 2012: 1-14. [73] Jimenez J I, Jahankhani H, Kendzierskyj S. Health care in the cyberspace: Medical cyber-physical system and digital twin challenges[J]. Digital twin technologies and smart cities, 2020: 79-92. [74] Tao F, Zhang H, Liu A, et al. Digital twin in industry: State-of-the-art[J]. IEEE Transactions on industrial informatics, 2018, 15(4): 2405-2415. [75] Gehrmann C, Gunnarsson M. A digital twin based industrial automation and control system security architecture[J]. IEEE Transactions on Industrial Informatics, 2019, 16(1): 669-680. [76] Dong R, She C, Hardjawana W, et al. Deep learning for hybrid 5G services in mobile edge computing systems: Learn from a digital twin[J]. IEEE Transactions on Wireless Communications, 2019, 18(10): 4692-4707. [77] Vinyals O, Babuschkin I, Czarnecki W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354. [78] Ye D, Liu Z, Sun M, et al. Mastering complex control in MOBA games with deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2020, 34(04): 6672-6679. [79] Zhao W, Queralta J P, Westerlund T. Sim-to-real transfer in deep reinforcement learning for robotics: a survey[C]//2020 IEEE symposium series on computational intelligence (SSCI). IEEE, 2020: 737-744. [80] Salvato E, Fenu G, Medvet E, et al. Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning[J]. IEEE Access, 2021, 9: 153171-153187. [81] Hundt A, Killeen B, Greene N, et al. “good robot!”: Efficient reinforcement learning for multi-step visual tasks with sim to real transfer[J]. IEEE Robotics and Automation Letters, 2020, 5(4): 6724-6731. [82] Hu H, Zhang K, Tan A H, et al. A sim-to-real pipeline for deep reinforcement learning for autonomous robot navigation in cluttered rough terrain[J]. IEEE Robotics and Automation Letters, 2021, 6(4): 6569-6576. [83] Liu Y, Xu H, Liu D, et al. A digital twin-based sim-to-real transfer for deep reinforcement learning-enabled industrial robot grasping[J]. Robotics and Computer-Integrated Manufacturing, 2022, 78: 102365. [84] 高保慧, 胡海. 天空海一体对海打击动态杀伤网作战概念研究[J]. 舰船电子工程, 2023, 43(1): 28-31. [85] 张明智, 邹立岩, 罗凯. 基于认知决策的智能无人机集群作战建模方法研究[J]. 军事运筹与评估, 2022, 37(4):61-67. [86] 赵严冰, 崔连虎. 基于 LVC 的舰艇电子对抗反导能力试验研究[J]. 舰船电子工程, 2019, 39(7): 161-165,193. [87] Tang X, Li X, Yu R, et al. Digital Twin Assisted Task Assignment in Multi-UAV Systems: A Deep Reinforcement Learning Approach[J]. IEEE Internet of Things Journal, 2023. [88] 邱志明, 李恒, 周玉芳, 等. 模拟仿真技术及其在训练领域的应用综述[J]. 系统仿真学报, 2023, 35(6): 1131. [89] Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI conference on artificial intelligence. 2016, 30(1). [90] Wang Z, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning[C]//International conference on machine learning. PMLR, 2016: 1995-2003. [91] Hessel M, Modayil J, Van Hasselt H, et al. Rainbow: Combining improvements in deep reinforcement learning[C]//Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1). [92] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[EB/OL]. 2017, https://arxiv.org/pdf/1707.06347.pdf. [93] Kuang Q, Jin X, Zhao Q, et al. Deep multimodality learning for UAV video aesthetic quality assessment[J]. IEEE Transactions on Multimedia, 2019, 22(10): 2623-2634. [94] Atif M, Ahmad R, Ahmad W, et al. UAV-assisted wireless localization for search and rescue[J]. IEEE Systems Journal, 2021, 15(3): 3261-3272. [95] Chuang H M, He D, Namiki A. Autonomous target tracking of UAV using high-speed visual feedback[J]. Applied Sciences, 2019, 9(21): 4552. [96] Xiao W, Li M, Alzahrani B, et al. A blockchain-based secure crowd monitoring system using UAV swarm[J]. IEEE Network, 2021, 35(1): 108-115. [97] Singh R P, Choudhary H R, Dubey A K. Trajectory design for UAV-to-ground communication with energy optimization using genetic algorithm for agriculture application[J]. IEEE Sensors Journal, 2020, 21(16): 17548-17555. [98] Reynolds C W. Flocks, herds and schools: A distributed behavioral model[C]//Proceedings of the 14th annual conference on Computer graphics and interactive techniques. 1987: 25-34. [99] Vicsek T, Czirók A, Ben-Jacob E, et al. Novel type of phase transition in a system of self-driven particles[J]. Physical review letters, 1995, 75(6): 1226. [100] Olfati-Saber R. Flocking for multi-agent dynamic systems: Algorithms and theory[J]. IEEE Transactions on automatic control, 2006, 51(3): 401-420. [101] Su H, Wang X, Lin Z. Flocking of multi-agents with a virtual leader[J]. IEEE transactions on automatic control, 2009, 54(2): 293-307. [102] Luo Q, Duan H. Distributed UAV flocking control based on homing pigeon hierarchical strategies[J]. Aerospace Science and Technology, 2017, 70: 257-264. [103] Zhao H, Liu H, Leung Y W, et al. Self-adaptive collective motion of swarm robots[J]. IEEE Transactions on Automation Science and Engineering, 2018, 15(4): 1533-1545. [104] Jennings A L, Ordonez R, Ceccarelli N. Dynamic programming applied to UAV way point path planning in wind[C]//2008 IEEE International Conference on Computer-Aided Control Systems. IEEE, 2008: 215-220. [105] Li J, Liao C, Zhang W, et al. UAV Path Planning Model Based on R5DOS Model Improved A-Star Algorithm[J]. Applied Sciences, 2022, 12(22): 11338. [106] Li B, Qi X, Yu B, et al. Trajectory planning for UAV based on improved ACO algorithm[J]. IEEE Access, 2019, 8: 2995-3006. [107] Lilicrap T, Hunt J, Pritzel A, et al. Continuous control with deep reinforcement learning[C]//International Conference on Representation Learning (ICRL). 2016. [108] Wang C, Wang J, Shen Y, et al. Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach[J]. IEEE Transactions on Vehicular Technology, 2019, 68(3): 2124-2136. [109] Wang C, Wang J, Wang J, et al. Deep-reinforcement-learning-based autonomous UAV navigation with sparse rewards[J]. IEEE Internet of Things Journal, 2020, 7(7): 6180-6190. [110] Singla A, Padakandla S, Bhatnagar S. Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge[J]. IEEE transactions on intelligent transportation systems, 2019, 22(1): 107-118. [111] Zhou Y, van Kampen E J, Chu Q. Hybrid hierarchical reinforcement learning for online guidance and navigation with partial observability[J]. Neurocomputing, 2019, 331: 443-457. [112] Qie H, Shi D, Shen T, et al. Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning[J]. IEEE access, 2019, 7: 146264-146272. [113] Liu C H, Chen Z, Tang J, et al. Energy-efficient UAV control for effective and fair communication coverage: A deep reinforcement learning approach[J]. IEEE Journal on Selected Areas in Communications, 2018, 36(9): 2059-2070. [114] Wang L, Wang K, Pan C, et al. Multi-agent deep reinforcement learning-based trajectory planning for multi-UAV assisted mobile edge computing[J]. IEEE Transactions on Cognitive Communications and Networking, 2020, 7(1): 73-84. [115] Hung S M, Givigi S N. A Q-learning approach to flocking with UAVs in a stochastic environment[J]. IEEE transactions on cybernetics, 2016, 47(1): 186-197. [116] Xu Z, Lyu Y, Pan Q, et al. Multi-vehicle flocking control with deep deterministic policy gradient method[C]//2018 IEEE 14th International Conference on Control and Automation (ICCA). IEEE, 2018: 306-311. [117] Yan C, Xiang X, Wang C. Fixed-Wing UAVs flocking in continuous spaces: A deep reinforcement learning approach[J]. Robotics and Autonomous Systems, 2020, 131: 103594. [118] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30. [119] Qi J, Song D, Shang H, et al. Search and rescue rotary-wing uav and its application to the lushan ms 7.0 earthquake[J]. Journal of Field Robotics, 2016, 33(3): 290-321. [120] Yang Y, Polycarpou M M, Minai A A. Multi-UAV cooperative search using an opportunistic learning method[J]. Journal of Dynamic Systems, Measurement, and Control, 2007, 129(5): 716-728. [121] Lo N, Berger J, Noel M. Toward optimizing static target search path planning[C]//2012 IEEE Symposium on Computational Intelligence for Security and Defence Applications. IEEE, 2012: 1-7. [122] Choset H. Coverage for robotics–a survey of recent results[J]. Annals of mathematics and artificial intelligence, 2001, 31: 113-126. [123] Ablavsky V, Snorrason M. Optimal search for a moving target-A geometric approach[C]//AIAA Guidance, Navigation, and Control Conference and Exhibit. 2000: 4060. [124] Gan S K, Sukkarieh S. Multi-UAV target search using explicit decentralized gradient-based negotiation[C]//2011 IEEE International Conference on Robotics and Automation. IEEE, 2011: 751-756. [125] Lanillos P, Gan S K, Besada-Portas E, et al. Multi-UAV target search using decentralized gradient-based negotiation with expected observation[J]. Information Sciences, 2014, 282: 92-110. [126] Rashid T, Samvelyan M, De Witt C S, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[J]. The Journal of Machine Learning Research, 2020, 21(1): 7234-7284. [127] Koopman B O. The theory of search. I. Kinematic bases[J]. Operations research, 1956, 4(3): 324-346. [128] Yu L, Han Q, Tuo X, et al. A Survey of Probabilistic Search Based on Bayesian Framework[C]//2019 4th International Conference on Mechanical, Control and Computer Engineering (ICMCCE). IEEE, 2019: 930-9305. [129] Bertuccelli L F, How J P. Robust UAV search for environments with imprecise probability maps[C]//Proceedings of the 44th IEEE Conference on Decision and Control. IEEE, 2005: 5680-5685. [130] Hu J, Xie L, Lum K Y, et al. Multiagent information fusion and cooperative control in target search[J]. IEEE Transactions on Control Systems Technology, 2012, 21(4): 1223-1235. [131] Hollinger G A, Yerramalli S, Singh S, et al. Distributed data fusion for multirobot search[J]. IEEE Transactions on Robotics, 2014, 31(1): 55-66. [132] Li X, Chen J. An efficient framework for target search with cooperative uavs in a fanet[C]//2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC). IEEE, 2017: 306-313. [133] Raap M, Meyer-Nieberg S, Pickl S, et al. Aerial vehicle search-path optimization: A novel method for emergency operations[J]. Journal of Optimization Theory and Applications, 2017, 172: 965-983. [134] Booth K E C, Piacentini C, Bernardini S, et al. Target search on road networks with range-constrained UAVs and ground-based mobile recharging vehicles[J]. IEEE Robotics and Automation Letters, 2020, 5(4): 6702-6709. [135] Zhen Z, Chen Y, Wen L, et al. An intelligent cooperative mission planning scheme of UAV swarm in uncertain dynamic environment[J]. Aerospace Science and Technology, 2020, 100: 105826. [136] Phung M D, Ha Q P. Motion-encoded particle swarm optimization for moving target search using UAVs[J]. Applied Soft Computing, 2020, 97: 106705. [137] Duan H, Zhao J, Deng Y, et al. Dynamic discrete pigeon-inspired optimization for multi-UAV cooperative search-attack mission planning[J]. IEEE Transactions on Aerospace and Electronic Systems, 2020, 57(1): 706-720. [138] Wu C, Ju B, Wu Y, et al. UAV autonomous target search based on deep reinforcement learning in complex disaster scene[J]. IEEE Access, 2019, 7: 117227-117245. [139] Qin X, Li X, Liu Y, et al. Multi-agent cooperative target search based on reinforcement learning[C]//Journal of Physics: Conference Series. IOP Publishing, 2020, 1549(2): 022104. [140] Gao M, Zhang X. Cooperative Search Method for Multiple UAVs Based on Deep Reinforcement Learning[J]. Sensors, 2022, 22(18): 6737. [141] Chung T H, Burdick J W. Analysis of search decision making using probabilistic search strategies[J]. IEEE Transactions on Robotics, 2011, 28(1): 132-144. [142] Oliehoek F A, Spaan M T J, Vlassis N. Optimal and approximate Q-value functions for decentralized POMDPs[J]. Journal of Artificial Intelligence Research, 2008, 32: 289-353. [143] Sunehag P, Lever G, Gruslys A, et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. 2018: 2085-2087. [144] Fortunato M, Azar M G, Piot B, et al. Noisy networks for exploration[J]. arXiv preprint arXiv:1706.10295, 2017. [145] Sutton R S, Barto A G. Reinforcement learning: An introduction[M]. MIT press, 2018. [146] 丁凯. 电子对抗在现代战争中的作用[J]. 电子技术与软件工程, 2019, 153(07): 85. [147] 陈旭. 雷达电子对抗技术的应用[J]. 电子技术, 2021, 50(03): 26-27. [148] Hurley S, Khan M I. Netted radar: Network communications design and optimization[J]. Ad Hoc Networks, 2011, 9(5): 736-751. [149] Yi W, Yuan Y, Hoseinnezhad R, et al. Resource scheduling for distributed multi-target tracking in netted colocated MIMO radar systems[J]. IEEE Transactions on Signal Processing, 2020, 68: 1602-1617. [150] Deng H. Orthogonal netted radar systems [J]. IEEE Transactions on Aerospace and Electronic Systems, 2012, 48(5):28-35. [151] 王笑梦. 无人机的天空从俄乌战争看无人机的应用和发展[J]. 坦克装甲车辆, 2022, 601(15): 60-66. [152] 吴中伟, 宋振之, 洪学尧. 电子对抗无人机在边境空中突防作战中的运用研究[J]. 战术导弹技术, 2022, 212(02): 52-58. [153] 沈阳, 陈永光, 李修和. 基于 0-1 规划的雷达干扰资源优化分配研究[J]. 兵工学报, 2007, 28(5): 528-532. [154] 陈美君, 毛秀丽, 毛秀华, 等. 飞机编队中干扰资源的最优配置算法研究[J]. 舰船电子对抗, 2016, 39(2): 14-16. [155] Xia J, Ma J, Li Y, et al. Cooperative jamming resource allocation based on integer-encoded directed mutation artificial bee colony algorithm[C]//2021 IEEE 4th International Conference on Electronic Information and Communication Technology (ICEICT). IEEE, 2021: 695-700. [156] Ye F, Che F, Gao L. Multiobjective cognitive cooperative jamming decision-making method based on Tabu search-artificial bee colony algorithm[J]. International Journal of Aerospace Engineering, 2018, 2018: 1-10. [157] 张大琳, 易伟, 孔令讲. 面向组网雷达干扰任务的多干扰机资源联合优化分配方法[J]. 雷达学报, 2021, 10(4): 595-606. [158] 戴少怀, 杨革文, 李旻, 等. 改进粒子群算法的组网雷达协同干扰资源分配[J]. 航天电子对抗, 2020, 36(4): 29-34. [159] 陈奕琪. 改进群智能算法多目标干扰决策[J]. 现代防御技术, 2020, 48(1): 107. [160] Yu C, Velu A, Vinitsky E, et al. The surprising effectiveness of ppo in cooperative multi-agent games[J]. Advances in Neural Information Processing Systems, 2022, 35: 24611-24624. [161] 张柏开, 朱卫纲. MFR 认知干扰决策体系构建及关键技术[J]. 系统工程与电子技术, 2020, 42(9): 1969-1975. [162] 张柏开, 朱卫纲. 基于 Q-Learning 的多功能雷达认知干扰决策方法[J]. 电讯技术, 2020, 60(2). [163] 朱霸坤, 朱卫纲, 李伟, 等. 基于马尔可夫的多功能雷达认知干扰决策建模研究[J]. 系统工程与电子技术, 2022, 44(8): 2488-2497. [164] 黄星源, 李岩屹. 基于双 Q 学习算法的干扰资源分配策略[J]. 系统仿真学报, 2021, 33(8): 1801. [165] 陈泽盛, 杨承志, 曹鹏宇, 等. 一种基于双 DQN 的空战干扰样式选择方法[J]. 电讯技术, 2021, 61(11). [166] Feng L W, Liu S T, Xu H Z. Multifunctional Radar Cognitive Jamming Decision Based on Dueling Double Deep Q-Network[J]. IEEE Access, 2022, 10: 112150-112157. [167] Zhang W, Zhao T, Zhao Z, et al. Performance analysis of deep reinforcement learning-based intelligent cooperative jamming method confronting multi-functional networked radar[J]. Signal Processing, 2023, 207: 108965. [168] 邹玮琦, 牛朝阳, 刘伟, 等. 基于 A3C 的多功能雷达认知干扰决策方法[J]. 系统工程与电子技术, 2023, 45(1): 86-92. [169] 胡小全, 刘钦, 孙建军. 雷达组网协同探测范围研究[J]. 雷达科学与技术, 2015, 13(3): 223-227. [170] 丁鹭飞, 耿富录, 陈建春. 雷达原理:第五版[M]. 电子工业出版社, 2014. ﹀
中图分类号：	TP181
馆藏号：	2024-004-0002
开放日期：	2024-07-11

附件下载