基于BalanceCascade-GBDT算法的类别不平衡虚假评论识别方法
    点此下载全文
引用本文:陶朝杰,杨 进.基于BalanceCascade-GBDT算法的类别不平衡虚假评论识别方法[J].经济数学,2020,(3):214-220
摘要点击次数: 500
全文下载次数: 0
作者单位
陶朝杰,杨 进 (上海理工大学 理学院, 上海 200093) 
中文摘要:虚假评论是电商发展过程中一个无法避免的难题. 针对在线评论数据中样本类别不平衡情况,提出基于BalanceCascade-GBDT算法的虚假评论识别方法. BalanceCascade算法通过设置分类器的误报率逐步缩小大类样本空间,然后集成所有基分类器构建最终分类器. GBDT以其高准确性和可解释性被广泛应用于分类问题中,并且作为样本扰动不稳定算法,是十分合适的基分类模型. 模型基于Yelp评论数据集,采用AUC值作为评价指标,并与逻辑回归、随机森林以及神经网络算法进行对比,实验证明了该方法的有效性.
中文关键词:虚假评论  类别不平衡  GBDT  BalanceCascsde  机器学习
 
Detection of Class-Imbalance Spam Reviews Based on BalanceCascade-GBDT Algorithm
Abstract:Spam review was an inevitable problem in the development process of e-commerce. In view of class-imbalance problem in online review data, this paper proposed a BalanceCascade-GBDT method to detect spam reviews. BalanceCascade set the false alarm rate of classifiers to reduce sample space of the majority class gradually and ensembled all base classifiers to build final classifier. GBDT was a suitable base classifier because it was widely used in classification due to its high accuracy and good interpretability and was sensitive to sample data. In terms of AUC and against three machine learning algorithms, the validity of the proposed method was proved.
keywords:spam review  class-imbalance  GBDT  BalanceCascade  machine learning
查看全文   查看/发表评论   下载pdf阅读器