基于BalanceCascade-GBDT算法的类别不平衡虚假评论识别方法 |
点此下载全文 |
引用本文:陶朝杰,杨 进.基于BalanceCascade-GBDT算法的类别不平衡虚假评论识别方法[J].经济数学,2020,(3):214-220 |
摘要点击次数: 627 |
全文下载次数: 0 |
|
|
中文摘要:虚假评论是电商发展过程中一个无法避免的难题. 针对在线评论数据中样本类别不平衡情况,提出基于BalanceCascade-GBDT算法的虚假评论识别方法. BalanceCascade算法通过设置分类器的误报率逐步缩小大类样本空间,然后集成所有基分类器构建最终分类器. GBDT以其高准确性和可解释性被广泛应用于分类问题中,并且作为样本扰动不稳定算法,是十分合适的基分类模型. 模型基于Yelp评论数据集,采用AUC值作为评价指标,并与逻辑回归、随机森林以及神经网络算法进行对比,实验证明了该方法的有效性. |
中文关键词:虚假评论 类别不平衡 GBDT BalanceCascsde 机器学习 |
|
Detection of Class-Imbalance Spam Reviews Based on BalanceCascade-GBDT Algorithm |
|
|
Abstract:Spam review was an inevitable problem in the development process of e-commerce. In view of class-imbalance problem in online review data, this paper proposed a BalanceCascade-GBDT method to detect spam reviews. BalanceCascade set the false alarm rate of classifiers to reduce sample space of the majority class gradually and ensembled all base classifiers to build final classifier. GBDT was a suitable base classifier because it was widely used in classification due to its high accuracy and good interpretability and was sensitive to sample data. In terms of AUC and against three machine learning algorithms, the validity of the proposed method was proved. |
keywords:spam review class-imbalance GBDT BalanceCascade machine learning |
查看全文 查看/发表评论 下载pdf阅读器 |