现代制造工程 ›› 2025, Vol. 539 ›› Issue (8): 93-99.doi: 10.16731/j.cnki.1671-3133.2025.08.010

• CAD/CAE/CAPP/CAM • 上一篇    下一篇

基于BM25和DSSM算法的工艺文本标准化方法

张金龙1,2, 高琦1,2,3, 吴春阳1,2, 翟健丰1,2, 李文琪1,2   

  1. 1 山东大学机械工程学院,济南 250061;
    2 高效洁净机械制造教育部重点实验室(山东大学),济南 250061;
    3 山东大学日照研究院,日照 276827
  • 收稿日期:2025-02-05 出版日期:2025-08-18 发布日期:2025-09-09
  • 通讯作者: 高琦,博士,教授,主要研究方向为产品生命周期管理、知识工程、智能设计和工业机器人。E-mail:gaoqi@sdu.edu.cn。
  • 作者简介:张金龙,硕士研究生,主要研究方向为工艺数据管理、工艺设计和知识图谱。
  • 基金资助:
    国家重点研发计划项目(2018YFB1702601)

Standardization method of process text based on BM25 and DSSM algorithm

ZHANG Jinlong1,2, GAO Qi1,2,3, WU Chunyang1,2, ZHAI Jianfeng1,2, LI Wenqi1,2   

  1. 1 School of Mechanical Engineering,Shandong University,Jinan 250061,China;
    2 Key Laboratory of High Efficiency and Clean Mechanical Manufacture (Shandong University), Ministry of Education,Jinan 250061,China;
    3 Rizhao Institute of Shandong University,Rizhao 276827,China
  • Received:2025-02-05 Online:2025-08-18 Published:2025-09-09

摘要: 工艺文本数据的标准化对制造业数据集成与重用有着重要的意义,为了解决制造类企业内工艺文本数据描述不规范、不统一的问题,提出一种非监督数据匹配和监督学习数据匹配相结合的方法,通过融合BM25算法和DSSM算法实现工艺文本数据的低成本标准化。首先,由企业工艺数据管理系统获取并预处理工艺文本数据,同时根据企业实际情况构建企业数据字典。其次,使用无监督的BM25算法,在文本相似度层面对小批量工艺文本数据和企业数据字典进行粗匹配,由专家校验粗匹配结果以生成训练数据集。最后,利用训练数据集支撑基于监督学习的DSSM算法的训练,实现工艺文本数据在语义相似度层面的精细匹配。在家电生产企业的工序名称标准化任务中进行了验证,证明了所提方法的有效性。该方法能够有效减少制造企业工艺文本数据标准化过程中的人工成本,并在最大程度上保证工艺数据标准化过程的准确性。

关键词: 计算机集成制造, 制造业, 工艺文本数据, 标准化, 文本匹配, 深度学习

Abstract: The standardization of process text data is crucial for data integration and reuse in manufacturing. To address the issue of inconsistent and non-uniform descriptions of process text data within manufacturing enterprises,a combined method of unsupervised data matching and supervised learning data matching was proposed,which integrating the BM25 algorithm and the DSSM algorithm to achieve low-cost standardization of process text data. First,the process text data was obtained and preprocessed from the enterprise′s process data management system,and an enterprise data dictionary was constructed based on the actual situation of the enterprise. Next,the unsupervised BM25 algorithm was used to coarsely match small batches of process text data with the enterprise data dictionary at the text similarity level.Experts then verified the coarse matching results to generate a training dataset. Finally,the training dataset was used to support the training of the DSSM algorithm based on supervised learning to achieve fine matching of process text data at the semantic similarity level. Validation was conducted on the standardization task of process names in a home appliance manufacturing company,demonstrating the effectiveness of the proposed method. This method can significantly reduce the labor costs involved in the standardization of process text data in manufacturing enterprises while ensuring the accuracy of the standardization process to the greatest extent possible.

Key words: computer integrated manufacturing, manufacturing, process text data, standardization, text matching, deep learning

中图分类号: 

版权所有 © 《现代制造工程》编辑部 
地址:北京市东城区东四块玉南街28号 邮编:100061 电话:010-67126028 电子信箱:2645173083@qq.com
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn