赵臣啸,薛惠锋,王磊,万毅.基于孤立森林算法的取用水量异常数据检测方法[J].中国水利水电科学研究院学报,2020,18(1):31-39
基于孤立森林算法的取用水量异常数据检测方法
Water Consumption Abnormal Data Detection Method based on Isolation Forest
投稿时间:2018-10-18  
DOI:10.13244/j.cnki.jiwhr.2020.01.004
中文关键词:  水资源监测  异常数据  平均插值  孤立森林  最小二乘拟合
英文关键词:water resources monitoring  abnormal data  average interpolation  isolation forest  least squares
基金项目:国家自然科学基金重点项目(U1501253)
作者单位
赵臣啸 中国航天系统科学与工程研究院, 北京 100048 
薛惠锋 中国航天系统科学与工程研究院, 北京 100048 
王磊 中国航天系统科学与工程研究院, 北京 100048 
万毅 水利部水资源管理中心, 北京 100053 
摘要点击次数: 4307
全文下载次数: 7314
中文摘要:
      水资源管理系统中储存着海量的取用水量数据,通过筛选数据中的异常值定位异常取水行为,是水资源监管的重要手段。对取用水量数据中的异常值普遍缺乏明确定义,传统的异常值检测算法在实时性和稳定性方面存在不足。在总结归纳现阶段取用水量异常数据种类、特点的基础上,首先运用平均插值法对可直观识别异常值进行预处理,在预处理后的数据中随机取样训练,建立多个孤立二叉树形成孤立森林,以此为工具对数据样本进行异常值检测。对某供水公司连续两年日取水量监测数据的实证分析结果表明,基于孤立森林算法的异常值检测方法将数据样本的特征通过非监督学习方式存储在森林中,具有更高的稳定性;能够准确检测出数据样本中的异常值,相比于传统最小二乘拟合方法具有更高的检出率。
英文摘要:
      Water resource management system store hugs amounts of data on water consumption,and it is an important means of water resource regulation to locate abnormal water intake behavior by screening the abnormal values in the data. These outliers lack effective classification. The traditional outlier detection algorithm has shortcomings in real-time and stability. On the basis of summarizing the types and characteristics of abnormal data of water consumption at the present stage, firstly, the average interpolation method is used to pre-process the outliers, and random sampling training is performed in the pre-processed data to establish multiple isolated binary trees to form isolation forest. The forest is used to perform outlier detection on data samples. The empirical analysis of the daily water intake monitoring data of a water supply company shows that the outlier detection method based on the isolation forest algorithm stores the characteristics of the data samples in the forest through unsupervised learning, which has higher stability and can accurately detect. The outliers in the data samples have a higher detection rate than the traditional least squares fitting method;they are suitable for real-time monitoring of water resources data.
查看全文  查看/发表评论  下载PDF阅读器
关闭