基于BM25和DSSM算法的工艺文本标准化方法

doi:10.16731/j.cnki.1671-3133.2025.08.010

现代制造工程 ›› 2025, Vol. 539 ›› Issue (8): 93-99.doi: 10.16731/j.cnki.1671-3133.2025.08.010

• CAD/CAE/CAPP/CAM • 上一篇下一篇

基于BM25和DSSM算法的工艺文本标准化方法

张金龙^1,2, 高琦^1,2,3, 吴春阳^1,2, 翟健丰^1,2, 李文琪^1,2

1 山东大学机械工程学院,济南 250061;
2 高效洁净机械制造教育部重点实验室(山东大学),济南 250061;
3 山东大学日照研究院,日照 276827

收稿日期:2025-02-05 出版日期:2025-08-18 发布日期:2025-09-09
通讯作者: 高琦,博士,教授,主要研究方向为产品生命周期管理、知识工程、智能设计和工业机器人。E-mail:gaoqi@sdu.edu.cn。
作者简介:张金龙,硕士研究生,主要研究方向为工艺数据管理、工艺设计和知识图谱。
基金资助:
国家重点研发计划项目(2018YFB1702601)

Standardization method of process text based on BM25 and DSSM algorithm

ZHANG Jinlong^1,2, GAO Qi^1,2,3, WU Chunyang^1,2, ZHAI Jianfeng^1,2, LI Wenqi^1,2

1 School of Mechanical Engineering,Shandong University,Jinan 250061,China;
2 Key Laboratory of High Efficiency and Clean Mechanical Manufacture (Shandong University), Ministry of Education,Jinan 250061,China;
3 Rizhao Institute of Shandong University,Rizhao 276827,China

Received:2025-02-05 Online:2025-08-18 Published:2025-09-09

摘要/Abstract

摘要： 工艺文本数据的标准化对制造业数据集成与重用有着重要的意义,为了解决制造类企业内工艺文本数据描述不规范、不统一的问题,提出一种非监督数据匹配和监督学习数据匹配相结合的方法,通过融合BM25算法和DSSM算法实现工艺文本数据的低成本标准化。首先,由企业工艺数据管理系统获取并预处理工艺文本数据,同时根据企业实际情况构建企业数据字典。其次,使用无监督的BM25算法,在文本相似度层面对小批量工艺文本数据和企业数据字典进行粗匹配,由专家校验粗匹配结果以生成训练数据集。最后,利用训练数据集支撑基于监督学习的DSSM算法的训练,实现工艺文本数据在语义相似度层面的精细匹配。在家电生产企业的工序名称标准化任务中进行了验证,证明了所提方法的有效性。该方法能够有效减少制造企业工艺文本数据标准化过程中的人工成本,并在最大程度上保证工艺数据标准化过程的准确性。

关键词: 计算机集成制造, 制造业, 工艺文本数据, 标准化, 文本匹配, 深度学习

Abstract: The standardization of process text data is crucial for data integration and reuse in manufacturing. To address the issue of inconsistent and non-uniform descriptions of process text data within manufacturing enterprises,a combined method of unsupervised data matching and supervised learning data matching was proposed,which integrating the BM25 algorithm and the DSSM algorithm to achieve low-cost standardization of process text data. First,the process text data was obtained and preprocessed from the enterprise′s process data management system,and an enterprise data dictionary was constructed based on the actual situation of the enterprise. Next,the unsupervised BM25 algorithm was used to coarsely match small batches of process text data with the enterprise data dictionary at the text similarity level.Experts then verified the coarse matching results to generate a training dataset. Finally,the training dataset was used to support the training of the DSSM algorithm based on supervised learning to achieve fine matching of process text data at the semantic similarity level. Validation was conducted on the standardization task of process names in a home appliance manufacturing company,demonstrating the effectiveness of the proposed method. This method can significantly reduce the labor costs involved in the standardization of process text data in manufacturing enterprises while ensuring the accuracy of the standardization process to the greatest extent possible.

Key words: computer integrated manufacturing, manufacturing, process text data, standardization, text matching, deep learning

中图分类号:

张金龙, 高琦, 吴春阳, 翟健丰, 李文琪. 基于BM25和DSSM算法的工艺文本标准化方法[J]. 现代制造工程, 2025, 539(8): 93-99.

ZHANG Jinlong, GAO Qi, WU Chunyang, ZHAI Jianfeng, LI Wenqi. Standardization method of process text based on BM25 and DSSM algorithm[J]. Modern Manufacturing Engineering, 2025, 539(8): 93-99.

参考文献

[1] GAL M S,RUBINFELD D L.Data standardization[J].NYUL Rev.,2019,94:737.
[2] SCHLEMITZ A,MEZHUYEV V.Approaches for data collection and process standardization in smart manufactur-ing:Systematic literature review[J].Journal of Industrial Information Integration,2024,38:100578.
[3] FAVORETTO C,MENDES G H D S,FILHO M G,et al.Digital transformation of business model in manufacturing companies:challenges and research agenda[J].Journal of Business & Industrial Marketing,2022,37(4):748-767.
[4] JONES M D,HUTCHESON S,CAMBA J D.Past,present,and future barriers to digital transformation in manufacturing:A review[J].Journal of Manufacturing Systems,2021,60:936-948.
[5] DAFFLON B,MOALLA N,OUZROUT Y.The challenges,approaches,and used techniques of CPS for manufacturing in Industry 4.0:a literature review[J].The International Journal of Advanced Manufacturing Technology,2021,113:2395-2412.
[6] MOHAMAD I B,USMAN D.Research article standardization and its effects on k-means clustering algorithm[J].Res J Appl Sci Eng Technol,2013,6(17):3299-3303.
[7] ALQAHTANI A,ALHAKAMI H,ALSUBAIT T,et al.A survey of text matching techniques[J].Engineering,Technology & Applied Science Research,2021,11(1):6656-6661.
[8] YUE G F,LIU J H,HOU Y Z.Design rationale knowledge management:A survey:Cooperative Design,Visualization,and Engineering[C]//15th International Conference,CDVE 2018.Hangzhou:Springer International Publishing,2018:245-253.
[9] SALTON G,WONG A,YANG C.A vector space model for automatic indexing[J].Communications of the ACM,1975,18(11):613-620.
[10] QAISER S,ALI R.Text mining:use of TF-IDF to examine the relevance of words to documents[J].International Journal of Computer Applications,2018,181(1):25-29.
[11] SEVERYN A,MOSCHITTI A.Learning to rank short text pairs with convolutional deep neural networks[C]//Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval.New York:ACM,2015:373-382.
[12] 赵伟,王文娟,甘玉芳.基于预训练模型和多视角循环神经网络的电力文本匹配模型[J].重庆邮电大学学报(自然科学版),2023,35(3):545-553.
[13] HUANG P S,HE X D,GAO J F,et al.Learning deep structured semantic models for web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management.New York:ACM,2013:2333-2338.

[1]	周鑫, 赵凯悦, 刘嘉, 陈成伟. 基于深度学习的短周期拉弧螺柱焊接质量检测[J]. 现代制造工程, 2025, 532(1): 87-93.
[2]	郑晨, 朱硕, 邵智超, 江志刚, 张华. 基于深度迁移学习的再制造零/部件可靠性预测方法^*[J]. 现代制造工程, 2024, 527(8): 144-151.
[3]	陈家芳, 刘钰凡, 吴朗. 基于MRSDAE-SOM结合HGRU的滚动轴承RUL预测[J]. 现代制造工程, 2024, 522(3): 148-155.
[4]	汤文虎;吴龙;黎尧;廖琳琳;严海峰. 基于改进Faster RCNN的钢线圈头部小目标检测算法[J]. 现代制造工程, 2023, 515(8): 127-133.
[5]	邓贤东，刘春华，陈晓辉，杨怀林，高润智，臧红彬. 基于深度学习的焊缝视觉跟踪方法研究[J]. 现代制造工程, 2023, 513(6): 124-131.
[6]	杨泽青;张明轩;陈英姝;平恩旭;方勇;吕雅丽;高岩. 基于机器视觉的表面缺陷检测方法研究进展[J]. 现代制造工程, 2023, 511(4): 143-156.
[7]	马宇超，付华良，吴鹏，陈信华，王鼎，陈帅，曹晨雨. 深度网络自适应优化的Mask R-CNN模型在铸件表面缺陷检测中的应用研究[J]. 现代制造工程, 2022, 499(4): 112-118.
[8]	杨祎宁，贺向东，赵庆，刘乘昊，魏鸿磊. 应用深度学习方法的汽车轮毂类型识别[J]. 现代制造工程, 2022, 507(12): 75-82.
[9]	陈欣瑞，周洋，赵屹涛，闫宪峰. 基于改进YOLOv5的移动端螺栓缺失检测方法[J]. 现代制造工程, 2022, 506(11): 108-114.
[10]	黄昌顺，张金萍. 基于CBAM-CNN的滚动轴承故障诊断方法[J]. 现代制造工程, 2022, 506(11): 137-143.
[11]	韩婷，石宇强. 基于振动信号融合的ACO-DCNN多工况设备故障诊断[J]. 现代制造工程, 2021, 492(9): 94-100.
[12]	李亚平，李素杰，马波，郭俊霞. 基于TCN的滚动轴承振动趋势与剩余寿命预测研究[J]. 现代制造工程, 2021, 492(9): 124-131.
[13]	吴雁，王晓军，何勇，黄新伟，肖礼军，郭立新. 数字孪生在制造业中的关键技术及应用研究综述[J]. 现代制造工程, 2021, 492(9): 137-145.
[14]	肖乾浩. 基于机器学习理论的机械故障诊断方法综述[J]. 现代制造工程, 2021, 490(7): 148-161.
[15]	盛永健，黄子龙，刘晨，曹毅，张洪. 基于改进卷积神经网络的燃气调压器故障识别研究[J]. 现代制造工程, 2021, 487(4): 132-138.

基于BM25和DSSM算法的工艺文本标准化方法

Standardization method of process text based on BM25 and DSSM algorithm

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics