浙江农业学报 ›› 2025, Vol. 37 ›› Issue (7): 1521-1532.DOI: 10.3969/j.issn.1004-1524.20240733

• 环境科学 • 上一篇    下一篇

基于多光谱变换和主成分分析的土壤全铁含量随机森林模型反演

江振蓝1(), 陈付勋1, 罗双飞1, 罗烨琴1, 沙晋明2,*()   

  1. 1.闽江学院 地理与海洋学院,福建 福州 350108
    2.福建师范大学 地理科学学院,福建 福州 350108
  • 收稿日期:2024-08-12 出版日期:2025-07-25 发布日期:2025-08-20
  • 作者简介:江振蓝(1977—),女,福建政和人,博士,教授,主要从事生态环境遥感与信息技术方面的研究。E-mail: jessie33cn@163.com
  • 通讯作者: *沙晋明,E-mail: jmsha@fjnu.edu.cn
  • 基金资助:
    福建省自然科学基金面上项目(2021J011020);福建省自然科学基金面上项目(2021J011022)

Inversion of soil total iron content using random forest model based on multi-spectral transformation and principle compoment analysis

JIANG Zhenlan1(), CHEN Fuxun1, LUO Shuangfei1, LUO Yeqin1, SHA Jinming2,*()   

  1. 1. College of Geography and Oceanography, Minjiang University, Fuzhou 350108, China
    2. School of Geographical Science, Fujian Normal University, Fuzhou 350108, China
  • Received:2024-08-12 Online:2025-07-25 Published:2025-08-20

摘要:

目前,土壤全铁含量的高光谱反演研究多采用单一光谱变量作为输入,忽视了光谱变量间的互补性。同时,光谱波段间的冗余信息也影响了模型的预测精度和泛化能力。为解决以上问题,以福州市土壤全铁含量为研究对象,提出了一种基于组合光谱和主成分分析(PCA)优化的随机森林(RF)模型。通过整合原始反射率及其13种数学变换,构建组合光谱变量集,并结合PCA与多元线性回归(MLR)、竞争性自适应重加权采样(CARS)、遗传算法(GA)、连续投影算法(SPA)、无信息变量去除(UVE)等变量选择方法进行变量优化。基于优化后的变量集,建立RF模型,用于土壤全铁含量的预测。结果表明,所构建的模型在验证集上的决定系数(R2)和相对分析误差(RPD)分别大于0.8和2.8,显示出良好的预测能力。其中,CARS-PCA-RF、GA-PCA-RF和MLR-PCA-RF模型在验证集上的RPD均大于3,预测能力突出,特别是CARS-PCA-RF模型的表现尤为出色,在验证集上的RPD值为3.292,显示了PCA结合CARS的变量选择方法在土壤全铁含量高光谱预测中的优势和潜力。该研究提出了一种基于多种光谱变换和PCA优化输入变量的土壤全铁含量预测方法,显著提升了土壤全铁含量预测的精度和稳定性,为区域土壤全铁含量的高光谱预测提供了新的解决方案。

关键词: 土壤全铁含量, 光谱变换, 随机森林, 主成分分析, 高光谱预测

Abstract:

Typical hyperspectral inversion models for soil total iron content use single spectral variables as input, which neglect the complementarity among spectral variables. Additionally, the redundancy of spectral bands affects the prediction accuracy and generalization ability of models. In the present study, a random forest (RF) model optimized by integrating spectral variables and principal component analysis (PCA) was proposed, and the soil total iron content in Fuzhou City of China was selected as the study object. By incorporating the original reflectance and its 13 mathematical transformations, a combined spectral variable set was constructed. For variable optimization, PCA was employed in conjunction with various variable selection methods, including multiple linear regression (MLR), competitive adaptive reweighted sampling (CARS), genetic algorithm (GA), successive projections algorithm (SPA), and uninformative variable elimination (UVE). Based on the optimized variable set, RF inversion models were established to predict soil total iron content. The results indicated that all the constructed models exhibited excellent predictive capability in the validation set, with determination coefficient (R2) values higher than 0.8 and relative percent difference (RPD) values exceeding 2.8. Among these model, the CARS-PCA-RF, GA-PCA-RF and MLR-PCA-RF models demonstrated strong predictive abilities, with RPD values in the validation set exceeding 3. Notably, the CARS-PCA-RF model performed the best, with an RPD value of 3.292 in the validation set, highlighting the advantages and potential of the variable selection method which combines PCA and CARS in the hyperspectral prediction of soil total iron content. This study proposed a method for predicting soil total iron content based on multiple spectral transformations and PCA-optimized input variables. This approach improved the accuracy and stability of soil total iron content prediction, providing a new solution for the hyperspectral prediction of regional soil total iron content.

Key words: soil total iron content, spectral transformation, random forest, principal component analysis, hyperspectral prediction

中图分类号: