赵慧子,周逸凡,段浩,赵红莉,张东.水文模型知识学习的命名实体识别方法研究[J].中国水利水电科学研究院学报,2023,21(6):574-585
水文模型知识学习的命名实体识别方法研究
Research on named entity recognition method oriented to hydrological model knowledge learning
投稿时间:2023-05-06  
DOI:10.13244/j.cnki.jiwhr.20230079
中文关键词:  水文模型知识  自然语言处理  命名实体识别  BERT模型  模式识别  知识抽取
英文关键词:hydrological model knowledge  knowledge extraction  natural language processing  named entity recognition  BERT  pattern recognition
基金项目:科技创新2030重大项目(2021ZD0113602);中国工程科技知识中心项目(CKCEST-2021-2-12,CKCEST-2022-1-35)
作者单位E-mail
赵慧子 中国水利水电科学研究院, 北京 100038
水利部数字孪生流域重点实验室, 北京 100038 
 
周逸凡 中国水利水电科学研究院, 北京 100038
水利部数字孪生流域重点实验室, 北京 100038 
 
段浩 中国水利水电科学研究院, 北京 100038
水利部数字孪生流域重点实验室, 北京 100038 
 
赵红莉 中国水利水电科学研究院, 北京 100038
水利部数字孪生流域重点实验室, 北京 100038 
zhaohl@iwhr.com 
张东 大连海事大学, 辽宁 大连 116026  
摘要点击次数: 465
全文下载次数: 749
中文摘要:
      为研究水利领域知识图谱构建中基于文本的知识自动抽取方法,本文以水文模型的名称、模拟要素、应用流域、计算时段、精度、继承-发展关系、研发人、研发单位等知识抽取为例,以883篇水文模型领域中文期刊论文为数据源,构建了BERT-Base-Chinese模型、LAC(Lexical Analysis of Chinese)工具、模式识别联合的多策略水文模型命名实体识别方法。本文采用五位序列标注法(BMOES)方法对期刊论文进行人工标注等处理,建立知识抽取的输入数据集,用于BERT模型训练以及多策略识别方法的性能评价。识别结果显示:多策略识别方法对8种水文模型领域命名实体识别结果精确率和召回率的调和平均数(F1值)均达到90%以上;针对不同实体类别,采取不同的命名实体识别方法较单BERT模型识别方法能有效提高识别性能。本文提出的方法可为水利领域其他场景的知识抽取提供参考,为领域知识图谱构建提供支撑。
英文摘要:
      To investigate the construction of a knowledge graph in the field of water and hydropower,this study focuses on the automatic extraction method of knowledge based on text.Taking the extraction of knowledge related to hydrological models as an example,which includes model names,simulation elements,application basins,calculation periods,accuracy,inheritance-development relationships,developers,and research institutions,and utilizing 883 Chinese journal articles in the field of hydrological models as the data source,this study proposes a multi-strategy recognition method for named entity recognition in hydrological models,which combines the BERT-Base-Chinese model,LAC(Lexical Analysis of Chinese)tool,and pattern recognition.The BMOES method is used to manually annotate the journal articles to create the input dataset for knowledge extraction.This dataset is then used for training the BERT model and evaluating the performance of the multi-strategy recognition method. The results show that the multi-strategy recognition method achieves an F1 score of over 90% for the precise and recall rate of named entity recognition for eight hydrological model categories.Moreover,employing multi-strategy named entity recognition methods for various entity categories has enhanced the overall recognition performance in comparison to the singular approach using the BERT model.The proposed method in this study can serve as a reference for knowledge extraction in other scenarios in the field of water and hydropower and support the construction of field knowledge graphs.
查看全文  查看/发表评论  下载PDF阅读器
关闭