基于KNN和Bayes算法的组合分类器的垃圾评论识别研究
    点此下载全文
引用本文:梁曌,陈思宇,梁小林,康欣.基于KNN和Bayes算法的组合分类器的垃圾评论识别研究[J].经济数学,2016,(1):36-41
摘要点击次数: 851
全文下载次数: 37
作者单位
梁曌,陈思宇,梁小林,康欣 基于KNN和Bayes算法的组合分类器的垃圾评论识别研究 
中文摘要:产品垃圾评论在一定程度上影响了评论信息的参考价值,本文旨在建立识别模型将垃圾评论从评论文本中剔除,保留真实的产品评论。首先,分析了产品评论的特点,从数据搜集、文本预处理、互信息检验、文本表示4个模块提取了14个特征。然后,利用高互补性建立了基于KNN和Bayes算法的组合分类器模型。最后,利用交叉验证对iPhone 6 Plus的产品评论进行检验,得到评价指标分别为:正确识别率75.3%、召回率82.1%以及F1值77.5%.
中文关键词:KNN算法  Bayes算法  组合分类器  互信息  交叉验证
 
Research on Identifying Product Review Spam Based on Combination Classification of KNN and Bayesian Algorithms
Abstract:Product review spam affects the reference value of information to a certain extent. The purpose of this paper was to set up a model to remove the product review spam, and retained the real product reviews. Firstly, this paper analyzed the characteristics of the product reviews, and abstracted 14 features from Data collecting, text preprocessing, mutual information inspecting, and text representing. Secondly, we established a model of combination classifications based on KNN and Bayes algorithm by using the biggest complementarity. Finally, we made cross validating to the product review for iPhone 6 Plus. This model gets a higher correct recognition rate of 75.3%, the recall rate of 82.1%, and F1 value 77.5%.
keywords:KNN algorithm  Bayes algorithm  combination classification  mutual information  cross validation
查看全文   查看/发表评论   下载pdf阅读器