查看论文信息

题名：	面向工业异常检测的高效跨模态映射与融合方法研究
作者：	钟志清
学号：	SZ2216020
保密级别：	公开
语种：	chi
学科代码：	085404
学科：	工学 - 电子信息 - 计算机技术
学生类型：	硕士
学位：	专业学位硕士
入学年份：	2022
学校：	南京航空航天大学
院系：	计算机科学与技术学院/人工智能学院
专业：	电子信息（专业学位）
导师姓名：	冯爱民
导师单位：	计算机科学与技术学院/人工智能学院
完成日期：	2024-12-30
答辩日期：	2025-03-17
外文题名：	Research on Efficient Cross-Modal Mapping and Fusion Methods for Industry Anomaly Detection
关键词：	工业异常检测 ; 跨模态学习 ; 多模态融合 ; 特征重构 ; 高效训练
外文关键词：	Industrial Anomaly Detection ; Cross-modal Learning ; Multi-modal Fusion ; Feature Reconstruction ; Efficient Training
摘要：	︿工业异常检测对于保障产品质量和生产效率至关重要，然而，依赖单一图像数据的传统方法在面对复杂的工业场景和细微缺陷时，往往难以达到令人满意的效果。例如，细微的表面缺陷、复杂的几何畸变以及与正常区域难以区分的瑕疵，都对基于单一图像的检测方法提出了巨大挑战。这些挑战促使研究者探索多模态方法，例如结合图像和三维点云数据，以期获得更全面、更准确的产品信息。虽然多模态方法能够提升工业异常检测的精度，但现有方法大多难以在计算效率和高精度之间取得平衡，其简单的特征融合策略也降低了模型的可解释性。为此，本文对工业异常检测的高效跨模态特征映射与融合方法进行了研究。具体工作如下： 1.针对现有工业异常检测方法难以兼顾计算效率和检测精度的问题，本文提出了一种高效的多模态异常检测框架DREAM。DREAM通过创新的双重重构机制和异步训练策略，有效提升了多模态异常检测的性能和效率。双重重构机制通过在2D和3D特征空间中进行双向映射，增强了模型对跨模态关系的理解，提升了检测精度。异步训练策略则分阶段训练不同模态的特征映射网络，提高了训练效率。此外，DREAM还采用了轻量级网络架构，进一步降低了模型的内存占用和推理时间。在MVTec 3D-AD和Eyecandies数据集上的实验结果表明，DREAM在保证高检测精度的同时，显著提升了计算效率，为多模态工业异常检测提供了一种高效的解决方案。 2.针对现有多模态异常检测方法在模态融合和可解释性方面存在的不足，本文提出了一种新颖的多模态异常检测框架OmniView。OmniView的核心创新在于其样本级模态融合机制，该机制通过生成多视角二维图像，有效融合了三维几何信息和二维图像信息，显著提升了模型的可解释性，并避免了简单特征级拼接可能带来的模态干扰。其无需训练的高效对齐和聚合模块，将多视角图像特征融合成维度仅为直接拼接特征40%的紧凑表示，进一步提升了计算效率。OmniView继承并拓展了DREAM框架的高效异步训练策略，并通过将融合特征与跨模态特征有机地进行交互，进一步提升了模型的性能和效率。在MVTec 3D-AD和Eyecandies数据集上的实验结果验证了OmniView框架在可解释性和性能方面的优势，证明了该方法在复杂工业场景中的有效性。﹀
外摘要要：	︿ Industrial anomaly detection is crucial for ensuring product quality and production efficiency. However, traditional methods relying on single-modal image data often fail to achieve satisfactory results when facing complex industrial scenarios and subtle defects. For instance, subtle surface defects, complex geometric distortions, and flaws that are difficult to distinguish from normal regions pose significant challenges to single-image-based detection methods. These challenges have motivated researchers to explore multi-modal approaches, such as combining image and 3D point cloud data, to obtain more comprehensive and accurate product information. Although multi-modal methods can improve the accuracy of industrial anomaly detection, existing approaches often struggle to balance computational efficiency and high precision, while their simple feature fusion strategies also reduce model interpretability. To address these issues, this thesis investigates efficient cross-modal feature mapping and fusion methods for industrial anomaly detection. The specific work is as follows: 1. To address the challenge of balancing computational efficiency and detection accuracy in existing industrial anomaly detection methods, this thesis proposes an efficient multi-modal anomaly detection framework called DREAM. DREAM effectively improves the performance and efficiency of multi-modal anomaly detection through innovative dual reconstruction mechanism and asynchronous training strategy. The dual reconstruction mechanism enhances the model's understanding of cross-modal relationships and improves detection accuracy through bidirectional mapping in 2D and 3D feature spaces. The asynchronous training strategy trains feature mapping networks of different modalities in stages, improving training efficiency. Additionally, DREAM adopts a lightweight network architecture, further reducing model memory consumption and inference time. Experimental results on MVTec 3D-AD and Eyecandies datasets demonstrate that DREAM significantly improves computational efficiency while maintaining high detection accuracy, providing an efficient solution for multi-modal industrial anomaly detection. 2. To address the limitations of existing multi-modal anomaly detection methods in terms of modal fusion and interpretability, this thesis proposes a novel multi-modal anomaly detection framework called OmniView. The core innovation of OmniView lies in its sample-level modal fusion mechanism, which effectively combines 3D geometric information and 2D image information through multi-view 2D image generation, significantly enhancing model interpretability while avoiding modal interference that may arise from simple feature-level concatenation. Its training-free efficient alignment and aggregation module fuses multi-view image features into a compact representation with only 40% of the dimensionality of direct concatenation features, further improving computational efficiency. OmniView inherits and extends DREAM framework's efficient asynchronous training strategy, and through organically interacting fusion features with cross-modal features, further enhances the model's performance and efficiency. Experimental results on MVTec 3D-AD and Eyecandies datasets validate the advantages of the OmniView framework in terms of interpretability and performance, demonstrating its effectiveness in complex industrial scenarios. ﹀
参考文献：	︿ [1] Lu Y. Industry 4.0: A survey on technologies, applications and open research issues[J]. Journal of Industrial Information Integration, 2017, 6:1–10. [2] Wang S, Wan J, Li D, et al. Implementing smart factory of industrie 4.0: an outlook[J]. International Journal of Distributed Sensor Networks, 2016, 12(1):3159805. [3] Lasi H, Fettke P, Kemper H G, et al. Industry 4.0[J]. Business & Information Systems Engineering,2014, 6:239–242. [4] Zhong R Y, Xu X, Klotz E, et al. Intelligent manufacturing in the context of industry 4.0: a review[J]. Engineering, 2017, 3(5):616–630. [5] Da Xu L, He W, Li S. Internet of things in industries: A survey[J]. IEEE Transactions on Industrial Informatics, 2014, 10(4):2233–2243. [6] Lee J, Bagheri B, Kao H A. A cyber-physical systems architecture for industry 4.0-based manufacturing systems[J]. Manufacturing Letters, 2015, 3:18–23. [7] Lu Y, Xu X, Wang L. Smart manufacturing process and system automation–a critical review of the standards and envisioned scenarios[J]. Journal of Manufacturing Systems, 2020, 56:312–325. [8] Li W, Mahadevan V, Vasconcelos N. Anomaly detection and localization in crowded scenes[J].IEEE transactions on Pattern Analysis and Machine Intelligence, 2013, 36(1):18–32. [9] Zipfel J, Verworner F, Fischer M, et al. Anomaly detection for industrial quality assurance: A comparative evaluation of unsupervised deep learning models[J]. Computers & Industrial Engineering,2023, 177:109045. [10] Zhao R, Yan R, Chen Z, et al. Deep learning and its applications to machine health monitoring[J].Mechanical Systems and Signal Processing, 2019, 115:213–237. [11] Wang J, Ma Y, Zhang L, et al. Deep learning for smart manufacturing: Methods and applications[J]. Journal of Manufacturing Systems, 2018, 48:144–156. [12] Lee J, Kao H A, Yang S. Service innovation and smart analytics for industry 4.0 and big data environment[J]. Procedia CIRP, 2014, 16:3–8. [13] Chandola V, Banerjee A, Kumar V. Anomaly detection: A survey[J]. ACM Computing Surveys(CSUR), 2009, 41(3):1–58. [14] Hodge V, Austin J. A survey of outlier detection methodologies[J]. Artificial Intelligence Review,2004, 22:85–126. [15] 柴浩轩, 金曦, 许驰, 等. 面向工业物联网的 5G 机器学习研究综述 [J]. 信息与控制, 2023,52(03):257–276. [16] Liu J, Xie G, Wang J, et al. Deep industrial image anomaly detection: A survey[J]. Machine Intelligence Research, 2024, 21(1):104–135. [17] Chalapathy R, Chawla S. Deep learning for anomaly detection: A survey[J]. arXiv preprint arXiv:1901.03407, 2019.. [18] Tao X, Gong X, Zhang X, et al. Deep learning for unsupervised anomaly localization in industrial images: A survey[J]. IEEE Transactions on Instrumentation and Measurement, 2022, 71:1–21. [19] Xie G, Wang J, Liu J, et al. Im-iad: Industrial image anomaly detection benchmark in manufacturing[J]. IEEE Transactions on Cybernetics, 2024.. [20] Pang G, Shen C, Cao L, et al. Deep learning for anomaly detection: A review[J]. ACM Computing Surveys (CSUR), 2021, 54(2):1–38. [21] Ruff L, Vandermeulen R, Goernitz N, et al. Deep one-class classification[C]. Proceedings of International Conference on Machine Learning. PMLR, 2018. 4393–4402. [22] Qin S J. Survey on data-driven industrial process monitoring and diagnosis[J]. Annual Reviews in Control, 2012, 36(2):220–234. [23] Lei Y, Yang B, Jiang X, et al. Applications of machine learning to machine fault diagnosis: A review and roadmap[J]. Mechanical Systems and Signal Processing, 2020, 138:106587. [24] Ahmad S, Lavin A, Purdy S, et al. Unsupervised real-time anomaly detection for streaming data[J]. Neurocomputing, 2017, 262:134–147. [25] Hundman K, Constantinou V, Laporte C, et al. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding[C]. Proceedings of Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery & Data mining, 2018. 387–395. [26] Li K, Zhang T, Wang R. Deep reinforcement learning for multiobjective optimization[J]. IEEE Transactions on Cybernetics, 2020, 51(6):3103–3114. [27] Horwitz E, Hoshen Y. Back to the feature: classical 3d features are (almost) all you need for 3d anomaly detection[C]. Proceedings of Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023. 2967–2976. [28] Lin Y, Chang Y, Tong X, et al. A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Anomaly Detection[J]. arXiv preprint arXiv:2410.21982, 2024.. [29] 张虎成, 李雷孝, 刘东江. 多模态数据融合研究综述 [J]. 计算机科学与探索, 2024,18(10):2501–2520. [30] Rudolph M, Wehrbein T, Rosenhahn B, et al. Asymmetric Student-Teacher Networks for Industrial Anomaly Detection[C]. Proceedings of Winter Conference on Applications of Computer Vision(WACV), 2023. [31] Gu Z, Zhang J, Liu L, et al. Rethinking Reverse Distillation for Multi-Modal Anomaly Detection[C]. Proceedings of Proceedings of the AAAI Conference on Artificial Intelligence, volume 38,2024. 8445–8453. [32] Wang Y, Peng J, Zhang J, et al. Multimodal Industrial Anomaly Detection via Hybrid Fusion[C].Proceedings of Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023. 8032–8041. [33] Cao Y, Xu X, Shen W. Complementary pseudo multimodal feature for point cloud anomaly detection[J]. Pattern Recognition, 2024, 156:110761. [34] Chu Y M, Liu C, Hsieh T I, et al. Shape-Guided Dual-Memory Learning for 3D Anomaly Detection[C]. Proceedings of Proceedings of the 40th International Conference on Machine Learning,2023. 6185–6194. [35] Tu Y, Zhang B, Liu L, et al. Self-supervised feature adaptation for 3d industrial anomaly detection[C]. Proceedings of European Conference on Computer Vision. Springer, 2025. 75–91. [36] Wang J, Wang X, Hao R, et al. Incremental Template Neighborhood Matching for 3D anomaly detection[J]. Neurocomputing, 2024, 581:127483. [37] Wang C, Zhu H, Peng J, et al. M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising[J]. arXiv preprint arXiv:2406.02263, 2024.. [38] Chen R, Xie G, Liu J, et al. Easynet: An easy network for 3d industrial anomaly detection[C]. Proceedings of Proceedings of the 31st ACM International Conference on Multimedia, 2023. 7038–7046. [39] Bi C, Li Y, Luo H. Dual-branch reconstruction network for industrial anomaly detection with RGB-D data[C]. Proceedings of International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024), volume 13180. SPIE, 2024. 767–774. [40] Zavrtanik V, Kristan M, Skočaj D. Cheating depth: Enhancing 3d surface anomaly detection via depth simulation[C]. Proceedings of Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024. 2164–2172. [41] Costanzino A, Zama Ramirez P, Lisanti G, et al. Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping[C]. Proceedings of Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2024. CVPR. [42] Zavrtanik V, Kristan M, Skočaj D. Keep DRÆMing: Discriminative 3D anomaly detection through anomaly simulation[J]. Pattern Recognition Letters, 2024, 181:113–119. [43] Rezende D, Mohamed S. Variational inference with normalizing flows[C]. Proceedings of International Conference on Machine Learning. PMLR, 2015. 1530–1538. [44] 周飞燕, 金林鹏, 董军, 等. 卷积神经网络研究综述 [J]. 计算机学报, 2017, 40(6):1229–1251. [45] Qi C R, Yi L, Su H, et al. Pointnet++: Deep hierarchical feature learning on point sets in a metric space[J]. Advances in Neural Information Processing Systems, 2017, 30. [46] Pang Y, Wang W, Tay F E, et al. Masked autoencoders for point cloud self-supervised learning[C].Proceedings of Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II. Springer, 2022. 604–621. [47] Zavrtanik V, Kristan M, Skočaj D. Draem-a discriminatively trained reconstruction embedding for surface anomaly detection[C]. Proceedings of Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. 8330–8339. [48] Sui W, Lichau D, Lefèvre J, et al. Cross-Modal Distillation in Industrial Anomaly Detection: Exploring Efficient Multi-Modal IAD[J]. arXiv preprint arXiv:2405.13571, 2024. [49] Schölkopf B, Williamson R C, Smola A, et al. Support vector method for novelty detection[J]. Advances in Neural Information Processing Systems, 1999, 12. [50] Vaswani A. Attention is all you need[J]. Advances in Neural Information Processing Systems,2017. [51] Wu T, Pan L, Zhang J, et al. Density-aware chamfer distance as a comprehensive metric for point cloud completion[J]. arXiv preprint arXiv:2111.12702, 2021. [52] Caron M, Touvron H, Misra I, et al. Emerging Properties in Self-Supervised Vision Transformers[C]. Proceedings of Proceedings of the International Conference on Computer Vision (ICCV),2021. [53] Kolesnikov A, Dosovitskiy A, Weissenborn D, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale[C]. Proceedings of International Conference on Learning Representations, 2021. [54] Fischler M A, Bolles R C. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography[J]. Communications of the ACM, 1981,24(6):381–395. [55] Ester M, Kriegel H P, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]. Proceedings of KDD, volume 96, 1996. 226–231. [56] Bergmann P, Xin J, Sattlegger D, et al. The MVTec 3D-AD Dataset for Unsupervised 3D Anomaly Detection and Localization[C]. Proceedings of Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, volume 5: VISAPP, 2022. 202–213. [57] Bonfiglioli L, Toschi M, Silvestri D, et al. The Eyecandies Dataset for Unsupervised Multimodal Anomaly Detection and Localization[C]. Proceedings of Proceedings of the Asian Conference on Computer Vision, 2022. 3586–3602. [58] Oquab M, Darcet T, Moutakanni T, et al. DINOv2: Learning Robust Visual Features without Supervision, 2023. [59] Bonfiglioli L, Toschi M, Silvestri D, et al. The Eyecandies Dataset for Unsupervised Multimodal Anomaly Detection and Localization[C]. Proceedings of Proceedings of the 16th Asian Conference on Computer Vision (ACCV2022, 2022. ACCV. [60] Bergmann P, Fauser M, Sattlegger D, et al. MVTec AD – A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection[C]. Proceedings of Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [61] Ramachandram D, Taylor G W. Deep multimodal learning: A survey on recent advances and trends[J]. IEEE Signal Processing Magazine, 2017, 34(6):96–108. [62] Roth K, Pemula L, Zepeda J, et al. Towards Total Recall in Industrial Anomaly Detection[C]. Proceedings of Proceedings of 2022 IEEE Conference on Computer Vision and Pattern Recognition,2022. 14298–14308. [63] Cohen N, Hoshen Y. Sub-Image Anomaly Detection with Deep Pyramid Correspondences[J].ArXiv, 2020. [64] Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database[C]. Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Ieee, 2009. 248–255. [65] Chang A X, Funkhouser T, Guibas L, et al. Shapenet: An information-rich 3d model repository[J].arXiv preprint arXiv:1512.03012, 2015. [66] Qi C R, Yi L, Su H, et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space[C]. In: Guyon I, Luxburg U V, Bengio S, et al., (eds.). Proceedings of Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. [67] Kingma D P, Ba J. Adam: A Method for Stochastic Optimization.[C]. Proceedings of The International Conference on Learning Representations (ICLR), 2015. [68] Bergmann P, Fauser M, Sattlegger D, et al. MVTec AD–A comprehensive real-world dataset for unsupervised anomaly detection[C]. Proceedings of Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 9592–9600. [69] Zavrtanik V, Kristan M, Skočaj D. Draem-a discriminatively trained reconstruction embedding for surface anomaly detection[C]. Proceedings of Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021. 8330–8339. [70] Deng H, Li X. Anomaly Detection via Reverse Distillation from One-Class Embedding[C]. Proceedings of Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022. 9737–9746. [71] Roth K, Pemula L, Zepeda J, et al. Towards total recall in industrial anomaly detection[C]. Proceedings of Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022. 14318–14328. [72] Liu Z, Zhou Y, Xu Y, et al. Simplenet: A simple network for image anomaly detection and localization[C]. Proceedings of Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023. 20402–20411. [73] Horwitz E, Hoshen Y. An Empirical Investigation of 3D Anomaly Detection and Segmentation[J].arXiv preprint arXiv:2203.05550, 2022. [74] Rudolph M, Wehrbein T, Rosenhahn B, et al. Asymmetric Student-Teacher Networks for Industrial Anomaly Detection[C]. Proceedings of Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023. 2592–2602. [75] Zhou Q Y, Park J, Koltun V. Open3D: A modern library for 3D data processing[J]. arXiv preprint arXiv:1801.09847, 2018. [76] Diebel J, et al. Representing attitude: Euler angles, unit quaternions, and rotation vectors[J].Matrix, 2006, 58(15-16):1–35. ﹀
中图分类号：	TP391
馆藏号：	2025-016-0270
开放日期：	2025-09-29

附件下载