免费文献传递   相关文献

PREDICTING SPECIES′ POTENTIAL DISTRIBUTION—SVM COMPARED WITH GARP

预测物种潜在分布区——比较SVM与GARP


物种分布与环境因子之间存在着紧密的联系,因此利用环境因子作为预测物种分布模型的变量是当前最普遍的建模思路,但是绝大多数物种分布预测模型都遇到了难以解决的“高维小样本"问题。该研究通过理论和实践证明,基于结构风险最小化原理的支持向量机(Support vector machine, SVM)算法非常适合“高维小样本"的分类问题。以20种杜鹃花属(Rhododendron)中国特有种为检验对象,利用标本数据和11个1 km×1km的栅格环境数据层作为模型变量,预测其在中国的潜在分布区,并通过全面的模型评估——专家评估,受试者工作特征(Receiver operator characteristic, ROC)曲线和曲线下方面积(Area under the curve, AUC)——来比较模型的性能。我们实现了以SVM为核心的物种分布预测系统,并且通过试验证明其无论在计算速度还是预测效果上都远远优于当前广泛使用的规则集合预测的遗传算法(Algorithm for rule-set prediction, GARP)预测系统。

Aims The most common method to build a predictive model of species′potential distribution is to use environmental factors, because they strongly affect species distribution. Unfortunately, most predictive models suffer from the “high dimension small sample size" problem, and cannot give satisfactory results in many cases. Support vector machine (SVM), which is based on structural risk minimization principle, has proven to be especially suitable for such data by both theory and abundant applications. Our objective was to implement a new predictive system of species′potential distribution based on the SVM method.
Methods We performed a country-scale case study using 20 Chinese endemic specie s of Rhododendron, employing herbarium specimen data and 11 layers of 1 km× 1 km digital environmental grid data. Through expert evaluation and receiver operator characteristic (ROC) curve, we compared SVM predictions with those of a commonly used modeling method, the genetic algorithm for rule-set prediction (GARP).
Important findings All scores of SVM’s prediction are higher than GARP′s in expert evaluation. For the statistical analysis of ROC curve, almost all the area under the curve (AUC) determinations of SVM are larger than that of GARP. Further more, SVM ′s prediction speed is much faster than GARP′s. Through our experiment, comprehensive evaluation proved that SVM is much better than GARP in terms of both performance and accuracy on the “high dimension small sample size" problem.