基于Logistic回归的主成分估计及实证分析
    点此下载全文
引用本文:胡 倩1,胡 尧1,2,刘 伟1.基于Logistic回归的主成分估计及实证分析[J].经济数学,2020,(4):123-129
摘要点击次数: 135
全文下载次数: 0
作者单位
胡 倩1,胡 尧1,2,刘 伟1 (1. 贵州大学 数学与统计学院贵州 贵阳 5500252. 贵州大学 贵州省公共大数据重点实验室贵州 贵阳 550025) 
中文摘要:应用主成分估计方法,对Logistic回归模型进行参数估计,并消除多重共线性影响.首先选取了累计贡献率达到85%以上的6个主成分,对因变量进行主成分估计,然后挑选出冠心病患者发病的主要影响因素,最后得到了因变量(冠心病发病)与6个主要影响因素(血压(sbp)、累计烟草量(tobacco)、低密度脂蛋白胆固醇(ldl)、心脏病家族史(famhist)、型表现(typea)和发病年龄(age))的回归模型.根据结果可知,心脏病家族史是导致心脏病发病最大的一个原因,它是一个不可控因素;在可控因素中,累计烟草量对冠心病发病的影响最大,因此建议患者应该控制烟草摄入量,以保证病情的稳定性.
中文关键词:Logistic回归  多重共线性  主成分估计  冠心病
 
Principal Component Estimation and Empirical Analysis Based on Logistic Regression
Abstract:The principal component estimation method is used to estimate the parameters of the logistic regression model and to eliminate the influence of multicollinearity. First, the 6 principal components with a cumulative contribution rate of more than 85% are selected, and the principal components are estimated for the dependent variable, and then the main influencing factors of the incidence of coronary heart disease are selected. Finally, this paper obtains the dependent variable (coronary heart disease incidence) and the 6 main influencing factors: blood pressure (sbp), cumulative tobacco volume (tobacco), low-density lipoprotein cholesterol (ldl), heart disease family regression model of history (famhist), type performance (typea) and age of onset (age). According to the analysis of the results, it can be seen that the family history of heart disease is the biggest cause of heart disease, and the second influencing factor is age. Both of these influencing factors are uncontrollable factors. Among the controllable factors, the cumulative amount of tobacco has an effect on coronary heart disease. The disease has the greatest impact, so it is recommended that patients should control tobacco intake to ensure the stability of the disease.
keywords:Logistic regression model  multicollinearity  principal component estimation  coronary heart disease
查看全文   查看/发表评论   下载pdf阅读器