题名: | 基于双视角图的图神经网络软件缺陷预测方法研究 |
作者: | |
学号: | SZ2216135 |
保密级别: | 公开 |
语种: | chi |
学科代码: | 085404 |
学科: | 工学 - 电子信息 - 计算机技术 |
学生类型: | 硕士 |
学位: | 专业学位硕士 |
入学年份: | 2022 |
学校: | 南京航空航天大学 |
院系: | |
专业: | |
研究方向: | 软件工程 |
导师姓名: | |
导师单位: | |
完成日期: | 2025-03-25 |
答辩日期: | 2025-03-12 |
外文题名: |
The Research on Software Defect Prediction Method Using Graph Neural Networks Based on Dual-View Graphs |
关键词: | |
外文关键词: | Software Defect Prediction ; Graph Neural Networks ; Dual-View Homogeneous Graph ; Dual-View Heterogeneous Graph ; Developer Features |
摘要: |
软件缺陷预测旨在识别软件开发过程中的高风险缺陷模块,以实现资源的高效分配。近年来,基于图神经网络的缺陷预测方法因其能够更全面地捕捉模块间的交互关系,受到了广泛关注和研究。随着软件系统日益复杂以及开发团队规模日益扩大,开发者的行为特征和协作模式对缺陷的产生影响愈加显著。然而,现有相关方法在构图方法上趋于同质化,通常仅依赖单一的代码视角构建同构图,忽视了开发者因素在软件开发中的作用。此外,目前缺少采用异构图形式进行缺陷预测的方法。如何突破当前构图方法的限制,在图结构上实现有效创新,成为当前研究的新挑战。 本文提出了两种基于图神经网络的软件缺陷预测方法,分别结合开发者与代码的双重视角,构建了两种新颖的双视角同构图和异构图,并将其应用于图神经网络中,在实际应用背景下进行了验证。本文的主要贡献如下: (1)针对当前软件依赖图视角单一的问题,本文提出了基于双视角同构图的图神经网络缺陷预测方法(DeDuVGN)。该方法提出了一种双视角软件依赖图构建策略,通过整合代码依赖和开发者依赖关系,旨在更加全面地刻画软件系统中的多维关系。此外,DeDuVGN结合了少数类过采样技术与双向门控图神经网络,提升了模型对少数类缺陷的识别能力。实验结果表明,DeDuVGN在多个开源软件项目的多项评价指标上显著优于现有方法,F1得分较当前最先进方法提升了10.7%。 (2)针对当前缺乏利用异构图进行缺陷预测的现状,本文在同构图结构上进一步扩展,提出了基于双视角异构图的图神经网络缺陷预测方法(DeDVHeGN)。该模型通过在异构图中引入代码模块和开发者两种节点类型,并定义多种依赖边类型,以更全面地反映软件系统中的复杂依赖关系。本方法还总结出一套多维度的开发者特征提取方法,并利用图节点采样策略改进异构图神经网络,有效缓解了缺陷类别不平衡问题。实验结果表明,DeDVHeGN在多个标准数据集上优于其他先进模型,F1得分较当前最先进方法提升了13.3%。同时,本文还比较了DeDuVGN与DeDVHeGN两种方法的优劣,讨论了各自适用的场景。 (3)设计并实现了基于上述两种方法的软件缺陷预测系统。该系统提供了一个可靠的应用工具,将上述方法封装为功能模块,并通过可视化交互界面呈现,旨在降低缺陷预测相关人员的学习门槛,使缺陷预测过程更加直观、高效和便捷。 |
外摘要要: |
Software defect prediction aims to identify high-risk defect modules during the software development process to enable efficient resource allocation. In recent years, defect prediction methods based on Graph Neural Networks (GNNs) have attracted widespread attention and research due to their ability to more comprehensively capture the interactions between modules, thus improving prediction performance. With the increasing complexity of software systems and the expanding scale of development teams, the influence of developer behaviors and collaboration patterns on defect generation has become more significant. However, existing methods are becoming more homogeneous in their graph construction approaches, typically relying solely on code-based perspectives to build homogeneous graphs, neglecting the role of developer factors in software development. Furthermore, there is a lack of methods using heterogeneous graphs for defect prediction. Overcoming the limitations of current graph construction methods and innovating in graph structure design has become a new challenge in the field. The thesis proposes two Graph Neural Network-based software defect prediction methods, each combining a dual-view approach that incorporates both developer and code features. Two novel dual-view homogeneous and heterogeneous graphs are constructed and applied to GNNs, with practical verification conducted in real-world application scenarios. The main contributions of the thesis are as follows: (1) To address the issue of a single perspective in current software dependency graphs, the thesis proposes a graph neural network defect prediction method based on dual-view homogeneous graphs (DeDuVGN). This method introduces a dual-view software dependency graph construction approach by integrating code dependencies and developer dependencies, aiming to provide a more comprehensive representation of multi-dimensional relationships in a software system. In addition, DeDuVGN combines minority class oversampling techniques with bidirectional gated graph neural networks to effectively address the data imbalance problem, improving the model's ability to identify minority defects. Experimental results show that DeDuVGN significantly outperforms existing methods on multiple evaluation metrics across several open-source software projects, with the F1 score improving by 10.7% compared to the current state-of-the-art methods. (2) In response to the lack of defect prediction methods using heterogeneous graphs, the thesis further extends the homogeneous graph structure and proposes a graph neural network defect prediction method based on dual-view heterogeneous graphs (DeDVHeGN). This model introduces two node types—code modules and developers—into the heterogeneous graph and defines multiple types of dependency edges to more comprehensively reflect the complex dependencies in a software system. The method also develops a multi-dimensional developer feature extraction approach and improves heterogeneous graph neural networks using a graph node sampling strategy to effectively mitigate the data imbalance issue. Experimental results show that DeDVHeGN outperforms other advanced models on multiple benchmark datasets, with the F1 score improving by 13.3% compared to the current state-of-the-art methods. Additionally, the thesis compares the strengths and weaknesses of DeDuVGN and DeDVHeGN, discussing their respective applicable scenarios. (3) A software defect prediction system based on the above two methods is designed and implemented. The system provides a reliable application tool by encapsulating the aforementioned methods into functional modules and presenting them through a visual interactive interface. The goal is to lower the learning threshold for defect prediction personnel and make the defect prediction process more intuitive, efficient, and convenient. |
参考文献: |
[2] 仇正霞, 杨剑锋, 胡文生, et al. 基于光滑样条回归的软件可靠性模型 [J]. Modeling and [3] Mugu S R, Zhang B, Kolla H, et al. Lessons from the CrowdStrike Incident: Assessing End- point Security Vulnerabilities and Implications[C]. Proceedings of 2024 Cyber Awareness and Research Symposium (CARS). IEEE, 2024. 1–10. [6] Thota M K, Shajin F H, Rajesh P, et al. Survey on software defect prediction techniques[J]. International Journal of Applied Science and Engineering, 2020, 17(4):331–344. [7] 邓枭, 叶蔚, 谢睿, et al. 基于深度学习的源代码缺陷检测研究综述 [J]. 软件学报, 2023, [8] Çalıklı G, Bener A B. Influence of confirmation biases of developers on software quality: an empirical study[J]. Software Quality Journal, 2013, 21:377–416. [10] Hsu H C, Lin T L, Wu B J, et al. FincGAN: A Gan Framework Performance analysis of machine [11] Elish K O, Elish M O. Predicting defect-prone software modules using support vector ma- chines[J]. Journal of Systems and Software, 2008, 81(5):649–660. engineering, 2008, 34(4):485–496. [22] Goodfellow I. Deep learning, 2016. [24] Golovko V, Kroshchanka A, Rubanau U, et al. A Learning Technique for Deep Belief Neural Networks[M]. Springer International Publishing, 2014: 136–146. [28] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, [30] Munir H S, Ren S, Mustafa M, et al. Attention based GRU-LSTM for software defect predic tion[J]. PLOS ONE, 2021, 16(3):e0247444. [36] Zhou Y, Liu S, Siow J, et al. Devign: Effective vulnerability identification by learning com- prehensive program semantics via graph neural networks[J]. Advances in neural information [42] Mockus A, Weiss D M. Predicting risk of software changes[J]. Bell Labs Technical Journal, [53] Gong L, Rajbahadur G K, Hassan A E, et al. Revisiting the impact of dependency network metrics on software defect prediction[J]. IEEE Transactions on Software Engineering, 2021, [57] Woolson R F. Wilcoxon signed-rank test[J]. Wiley encyclopedia of clinical trials, 2007. 1–3. [64] Zimmermann T, Nagappan N. Predicting defects using network analysis on dependency graphs[C]. Proceedings of Proceedings of the 30th international conference on Software en- [65] Buse R P L, Weimer W R. Learning a Metric for Code Readability[J]. IEEE Transactions on Software Engineering, 2010, 36(4):546–558. [66] Mockus A, Weiss D M. Predicting risk of software changes[J]. Bell Labs Technical Journal, [70] Mantyla M, Lassenius C. What Types of Defects Are Really Discovered in Code Reviews?[J]. IEEE Transactions on Software Engineering, 2009, 35(3):430–448. [71] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling tech- nique[J]. Journal of artificial intelligence research, 2002, 16:321–357. [76] Yi J, Kim B, Chang B. Embedding Normalization: Significance Preserving Feature Normal- Software Repositories. IEEE, 2017. 135–145. [85] Turhan B, Menzies T, Bener A B, et al. On the relative value of cross-company and within- company data for defect prediction[J]. Empirical Software Engineering, 2009, 14:540–578. [86] Ryu D, Jang J I, Baik J. A transfer cost-sensitive boosting approach for cross-project defect prediction[J]. Software Quality Journal, 2017, 25:235–272. [88] Diederik P K. Adam: A method for stochastic optimization[J]. (No Title), 2014.. [90] Zeng C, Zhou C Y, Lv S K, et al. GCN2defect: Graph Convolutional Networks for SMOTETomek-based Software Defect Prediction[C]. Proceedings of 2021 IEEE 32nd Inter- national Symposium on Software Reliability Engineering. IEEE, 2021. 69–79. |
中图分类号: | TP311 |
馆藏号: | 2025-016-0155 |
开放日期: | 2025-09-29 |