基于BalanceCascade-GBDT算法的类别不平衡虚假评论识别方法

陶朝杰; 杨   进

基于BalanceCascade-GBDT算法的类别不平衡虚假评论识别方法

引用本文：陶朝杰,杨进.基于BalanceCascade-GBDT算法的类别不平衡虚假评论识别方法[J].经济数学,2020,(3):214-220

摘要点击次数: 543

全文下载次数: 0

作者	单位
陶朝杰,杨进	(上海理工大学理学院, 上海 200093)

中文摘要:虚假评论是电商发展过程中一个无法避免的难题. 针对在线评论数据中样本类别不平衡情况，提出基于BalanceCascade-GBDT算法的虚假评论识别方法. BalanceCascade算法通过设置分类器的误报率逐步缩小大类样本空间，然后集成所有基分类器构建最终分类器. GBDT以其高准确性和可解释性被广泛应用于分类问题中，并且作为样本扰动不稳定算法，是十分合适的基分类模型. 模型基于Yelp评论数据集，采用AUC值作为评价指标，并与逻辑回归、随机森林以及神经网络算法进行对比，实验证明了该方法的有效性.

中文关键词:虚假评论类别不平衡 GBDT BalanceCascsde 机器学习

Detection of Class-Imbalance Spam Reviews Based on BalanceCascade-GBDT Algorithm

Abstract:Spam review was an inevitable problem in the development process of e-commerce. In view of class-imbalance problem in online review data, this paper proposed a BalanceCascade-GBDT method to detect spam reviews. BalanceCascade set the false alarm rate of classifiers to reduce sample space of the majority class gradually and ensembled all base classifiers to build final classifier. GBDT was a suitable base classifier because it was widely used in classification due to its high accuracy and good interpretability and was sensitive to sample data. In terms of AUC and against three machine learning algorithms, the validity of the proposed method was proved.

keywords:spam review class-imbalance GBDT BalanceCascade machine learning

查看全文 查看/发表评论 下载pdf阅读器