大数据背景下水厂自动投矾模型研究

戴  宏1; 朱恩文2; 李今平3; 曹  峻2; $22

大数据背景下水厂自动投矾模型研究

引用本文：戴宏1，朱恩文2，李今平3，曹峻2，于博骏2.大数据背景下水厂自动投矾模型研究[J].经济数学,2020,(4):182-189

摘要点击次数: 796

全文下载次数: 0

作者	单位
戴宏1，朱恩文2，李今平3，曹峻2，于博骏2	（1.长沙水业集团有限公司,湖南长沙 410015；2.长沙理工大学数学与统计学院，湖南长沙 410114；3.海南大学理学院，海南海口 570228）

中文摘要:根据某市自来水有限责任公司第二水厂的历史矾耗数据，建立矾耗流量关于原水浊度、温度等的动态矾耗模型. 通过对数据进行处理得到10900个合格且净水效果高效的数据，将筛选出的数据分为训练样本集和测试样本集. 在回归拟合中，通过拟合R2的大小将原水浊度划分为“低浊”“中浊”“高浊”3个区间，利用泰勒展开公式的非线性变量代换分别对3个区间建立不同的多项式回归模型，得到预测正确率约为72%，总的矾耗流量值约减少了9.6%的结果；在随机森林模型中，使用10900个合格数据，利用训练样本集，以“原水浊度”“pH值”“原水流量”和“水温”为输入变量，建立包含2000棵决策树的随机森林模型，得到预测正确率约为44. 21%，总的矾耗流量值增加了0.04%的结果. 从模型对合格数据的拟合优度看，随机森林模型比非线性回归模型效果更好；在平均绝对误差、平均绝对偏差百分比等评价指标上，前者均优于后者；但从历史数据检验的结果，模型的可解读性，模型的操作难度和推广角度看，分段二元非线性回归模型的优势更为突出.

中文关键词:动态矾耗模型随机森林模型非线性回归模型

Research on Dosing Coagulation Models in Waterworks under the Background of Big Data

Abstract:Based on the historical alum consumption data of the second water plant of a city waterworks responsibility co., Ltd., the dynamic alum consumption models of raw water turbidity and temperature were established. 10900 qualified and efficient water purification data were obtained by processing the data, and the selected data were divided into training sample set and test sample set. In regression fitting, the turbidity of raw water was divided into three intervals: "low turbid", "medium turbid" and "high turbid" by fitting the size of R2. Using the nonlinear variable substitution of Taylor expansion formula to establish different multinomial regression models for the three intervals has the prediction accuracy of 72%, and the total alum consumption value is reduced by about 9.6%.In the stochastic forest model, using 10900 eligible data and the training sample set, making use of the "raw water turbidity", the "ph value", the "raw water flow" and the "water temperature" as input variables, a random forest model containing 2000 decision trees was established to obtain the predicted correct rate of about 44.21%. The total alum consumption value increased by 0.04% .From the view point of the goodness of fit of the model to the qualified data, the effect of the stochastic forest model is better than that of the nonlinear regression model. The former is superior to the latter in terms of average absolute error, average absolute deviation percentage and other evaluation indexes. However, from the historical data test results, the interpretability of the model, the operation difficulty of the model and the popularization perspective, the advantage of the segmented binary nonlinear regression model is more prominent.

keywords:dynamic alum consumption model stochastic forest model nonlinear regression model

查看全文 查看/发表评论 下载pdf阅读器