现代制造工程 ›› 2026, Vol. 547 ›› Issue (4): 103-116.doi: 10.16731/j.cnki.1671-3133.2026.04.013

• 仪器仪表/检测/监控 • 上一篇    下一篇

改进YOLOv11与多模态融合机制的钢材表面缺陷检测方法*

徐慧玲1, 尚冠宇2   

  1. 1 开封大学信息工程学院,开封 475004;
    2 郑州西亚斯学院电信与智能制造学院,郑州 451150
  • 收稿日期:2025-05-28 发布日期:2026-05-07
  • 通讯作者: 尚冠宇,硕士,副教授,主要研究方向为人工智能、计算机科学与技术。E-mail:sgyu80@126.com
  • 作者简介:徐慧玲,硕士,讲师,主要研究方向为计算机应用技术与人工智能。E-mail:huilingx@126.com
  • 基金资助:
    *河南省科技发展计划项目(252102110375);河南省专创融合特色示范课程项目(教办高(2024)144号-190);河南省教育厅河南省本科高校2023年度产教融合研究项目(教办高(2024)13号-重点项目57)

Steel surface defect detection method based on improved YOLOv11 and multimodal fusion mechanism

XU Huiling1, SHANG Guanyu2   

  1. 1 School of Information Engineering,Kaifeng University,Kaifeng 475004,China;
    2 School of Telecommunications and Intelligent Manufacturing,Zhengzhou Sias University,Zhengzhou 451150,China
  • Received:2025-05-28 Published:2026-05-07

摘要: 针对钢材表面缺陷检测中存在的复杂背景干扰和缺陷多样性等问题,提出了一种改进YOLOv11与多模态融合机制的钢材表面缺陷检测方法。首先,在YOLOv11的基础上设计了双流架构的网络模型,分别提取RGB图像流和深度图像流的多模态特征信息,并利用空洞卷积扩大深度图像的感受野,通过深度增强模块自适应调整各通道权重来强化钢材表面几何结构的感知能力;其次,利用自注意力机制增加对钢材缺陷关键区域的关注度,确保多模态信息有效融合;然后,设计了梯度感知区域平衡损失函数,通过引入梯度一致性损失和特征对齐损失,并将多个损失项加权求和,旨在提升钢材表面缺陷检测的精度、鲁棒性和适应性;最后,设计了自适应多模态协同量化策略,确保在不损失检测精度的同时提高双流网络模型的推理速度。实验结果表明,提出的改进方法与几种主流的缺陷检测算法相比具有更优的检测性能,且能够较好地平衡检测精度与速度,mAP为92.1 %,检测速度达到了48.3 FPS,较原始YOLOv11分别提升了6.7 %和11.8 FPS,并通过可视化结果展现出其能够适应复杂环境和缺陷多样性的检测任务,为钢材制造和加工领域的自动化质量检测提供了技术支撑。

关键词: 钢材表面缺陷检测, YOLOv11, 双流架构, 多模态

Abstract: To address complex background interference and defect diversity in steel surface defect detection,a steel surface defect detection method based on improved YOLOv11 and a multimodal fusion mechanism was proposed.Firstly,a dual-stream architecture network model was designed based on YOLOv11,which used to extract multi-modal feature information from RGB image stream and depth image stream,respectively. The dilated convolution was used to expand the receptive field of the depth image,and the depth enhancement module adaptively adjusted the weights of each channel to enhance the perception ability of the geometric structure of the steel surface. A self-attention mechanism was used to focus on critical defect areas and enhance the fusion of multimodal information.Next,the gradient aware region balance loss function was designed,which introduced gradient consistency loss and feature alignment loss,and weighted and summed multiple loss terms to improve the accuracy,robustness,and adaptability of steel surface defect detection. Finally,an adaptive multimodal collaborative quantization strategy was designed,which used to ensure improved inference speed of the dual-stream network model without sacrificing detection accuracy. The experimental results showed that the proposed improved method had better detection performance,compared to several mainstream defect detection algorithms,and could balance detection accuracy and speed well. The mAP reached 92.1 %,and the detection speed reached 48.3 FPS,showing improvements of 6.7 % and 11.8 FPS over the original YOLOv11,respectively. The visualization results demonstrated it′s adaptability to complex environments and diverse defect types,supporting automated quality inspection in steel manufacturing.

Key words: steel surface defect detection, YOLOv11, dual-stream architecture, multimodal

中图分类号: 

版权所有 © 《现代制造工程》编辑部 
地址:北京市东城区东四块玉南街28号 邮编:100061 电话:010-67126028 电子信箱:2645173083@qq.com
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn