Establishment of a Hyperspectral Spectroscopy-Based Biochemical Component Detection Model for Green Tea Processing Materials
-
摘要: 目的:建立高光谱技术快速检测绿茶加工原料生化成分的方法。方法:用高光谱相机对加工过程中的茶叶原料进行实时拍摄,获取茶叶原料的光谱数据;对样本的含水率、游离氨基酸、茶多酚以及咖啡碱的含量进行检测;光谱数据预处理后,利用无信息变量消除法(uninformative variable elimination,UVE)、竞争性自适应重加权法(competitive adaptive reweighted sampling,CARS)、连续投影算法(successive projections algorithm,SPA)三种特征提取方法与偏最小二乘(partial least-squares,PLS)、支持向量机(support vector machine,SVM)和随机森林(random forest,RF)三种机器学习模型分别组合进行建模分析,预测茶叶原料中的含水率、游离氨基酸、茶多酚和咖啡碱的含量。结果:茶叶原料的含水率、游离氨基酸、茶多酚和咖啡碱最佳组合模型分别为UVE-RF、CARS-SVM、UVE-SVM、UVE-PLS,决定系数(coefficient of determination,R2)分别为0.99、0.92、0.97、0.87,交互验证均方根误差(root mean square error of cross validation,RMSECV)分别为0.7615%、0.723 μg·g−1、0.3701%、0.1197%,相对分析误差(relative percent difference,RPD)分别为10.2093%、25.446 μg·g−1、3.5851%、2.5284%。结论:相关性高,建模误差合理,模型效果优秀,可以有效检测加工过程中茶叶原料的生化成分。该方法不仅无损,而且快速准确,有望在茶叶加工中得到广泛应用。Abstract: Objective: To establish a method for rapid detection of biochemical components of green tea processing materials by hyperspectral technique. Methods: The hyperspectral camera was employed to capture real-time images of the tea raw materials during the processing procedure in order to collect the spectral data of the tea raw materials. The samples' moisture content, free amino acids, tea polyphenols, and caffeine content were all found. After spectral data preprocessing, three feature extraction methods, uninformative variable elimination (UVE), competitive adaptive reweighted sampling (CARS), and successive projections algorithm (SPA) and partial least-squares (PLS), support vector machine (SVM) and random forest (RF) were combined to predict the water content, free amino acids, polyphenols and caffeine content of tea raw materials. Result: The best combination models of water content, free amino acids, tea polyphenols and caffeine of tea raw materials were UVE-RF, CARS-SVM, UVE-SVM and UVE-PLS, with the coefficient of determination (R2) of 0.99, 0.92, 0.97 and 0.87, and the root mean square error of cross validation (RMSECV) of 0.7615%, 0.723 μg·g−1, 0.3701% and 0.1197%, respectively, the relative percent difference (RPD) was 10.2093%, 25.446 μg·g−1, 3.5851% and 2.5284%, respectively. Conclusion: High correlation, appropriate modeling error, outstanding model effect, and the ability to accurately identify the biochemical components of raw materials throughout processing are all characteristics of the model. This technique is not only quick and precise but also non-destructive. In the processing of tea, it is anticipated to be widely employed.
-
表 1 各加工步骤的样品数据平均值
Table 1. Average value of sample data for each processing step
加工步骤 含水率(%) 茶多酚(%) 氨基酸(×10 μg·g−1) 咖啡碱(%) 茶鲜叶 79.28 23.16 1.86 3.45 摊晾1 h 73.17 24.85 2.12 3.97 摊晾2 h 72.34 26.06 2.49 4.21 摊晾3 h 73.19 24.17 2.16 4.21 摊晾4 h 71.83 22.18 2.19 3.80 杀青180 ℃ 59.23 25.58 1.22 3.55 杀青200 ℃ 57.38 26.80 1.50 4.07 杀青220 ℃ 53.31 24.54 1.39 3.53 杀青240 ℃ 52.78 24.56 1.47 3.49 杀青260 ℃ 48.12 24.24 1.81 3.69 杀青280 ℃ 44.16 25.34 2.00 3.70 揉捻20 min 51.01 22.39 2.45 3.75 揉捻40 min 50.77 21.66 2.33 3.40 揉捻60 min 51.63 22.24 2.50 3.59 滚筒做形 20.40 19.47 2.03 4.26 炒锅做形 15.45 25.15 2.19 3.24 干燥70 ℃ 23.14 2.08 3.57 干燥80 ℃ 21.81 2.08 3.62 干燥90 ℃ 22.88 2.14 3.54 干燥100 ℃ 22.53 2.12 3.38 表 2 训练集与测试集的生化成分统计分析
Table 2. Statistical Analysis of biochemical components of training set and test set
生化成分 光谱图片数量(张) 总样本(个) 训练样本(个) 预测样本(%) 最大值 最小值 均值 标准差 含水率 16 48 36 12 83.72 15 54.64 17.45 茶多酚 20 60 43 17 26.97 18.86 23.64 1.8 氨基酸 20 60 43 17 2.54 0.83 2.01 0.37 咖啡碱 20 65 50 15 5.01 2.84 3.7 0.34 注:咖啡碱测量时,为保障数据准确性,故多测量5组数据,一并用于建模。 表 3 波段筛选结果
Table 3. Band screening results
生化成分 筛选方法 波长数目 波长范围 含水率 UVE 84 400~449,463~494,545~563,642~670,687~735,770~783,814~866,935~953,1001~1004 CARS 32 401,583~604,642~659,677~683,704~708,766~773,797~808,835~839,863~870,918 SPA 14 442,466,511,556,608,677,701,721,787,839,863,890,935,987 茶多酚 UVE 55 680~708,821~915,935~994 CARS 40 559,597~608,649~663,683~689,701~708,721~749,766~777,787~797,821~877,939~942,966~980,997~1004 SPA 16 404,494,532,639,690,711,725,746,770,801,856,873,894,921,949,973 咖啡碱 UVE 28 507,511,756~763,901~936,966~1004 CARS 18 432,452,459,521,590,621,628,783,787,821,835,842,887,894,973,984,987,1004 SPA 11 442,466,677,714,750,787,842,866,897,935,977 氨基酸 UVE 48 476~497,542~566,649~680,877~918,953~970 CARS 15 418~421,452~459,473,563,573,608~614,659~663,963,970 SPA 14 401,442,490,514,556,659,680,701,721,752,790,835,887,942 表 4 模型结果汇总表
Table 4. Summary of model results
生化成分 模型 训练集 交互验证均方根误差
RMSECV预测集 均方根误差 RMSEC 相关系数 Rcal 均方根误差 RMSEP 相对分析误差 RPD 相关系数 Rp 决定系数 R2 含水率 UVE-PLS 0.0155 0.9983 0.5483 2.2716 7.3261 0.9914 0.98 CARS-SVM 2.0253 0.9941 1.0458 1.3819 11.4795 0.9959 0.99 UVE-RF 1.5934 0.9936 0.7615 1.4747 10.2093 0.9902 0.99 茶多酚 SPA-PLS 0.0929 0.9031 0.4427 0.6184 2.9274 0.9524 0.91 UVE-SVM 0.6206 0.9356 0.3701 0.5049 3.5851 0.9694 0.97 UVE-RF 0.6033 0.892 0.3598 0.7321 2.226 0.8814 0.88 氨基酸 CARS-PLS 0.0747 0.9405 0.0764 0.1382 2.6253 0.9264 0.86 CARS-SVM 0.1212 0.9478 0.0723 0.1347 2.5446 0.9247 0.92 SPA-RF 0.1148 0.9102 0.0684 0.1399 2.4942 0.8595 0.86 咖啡碱 UVE-PLS 0.0954 0.813 0.1197 0.1078 2.5284 0.9327 0.87 CARS-SVM 0.2389 0.7626 0.1379 0.1344 1.7255 0.9129 0.91 SPA-RF 0.2204 0.6448 0.1273 0.1522 1.3618 0.7826 0.78 表 5 最佳建模结果
Table 5. The best results of modeling
生化成分 最优方法 R2 RMSECV RPD 含水率 UVE-RF 0.99 0.7615 10.2093 氨基酸 CARS-SVM 0.92 0.0723 2.5446 茶多酚 UVE-SVM 0.97 0.3701 3.5851 咖啡碱 UVE-PLS 0.87 0.1197 2.5284 -
[1] 曹佳. 浅谈我国茶叶加工的发展现状、趋势及创新[J]. 福建茶叶,2021,43(11):35−36. [CAO J. On the development status, trend and innovation of tea processing in China[J]. Tea in Fujian,2021,43(11):35−36. doi: 10.3969/j.issn.1005-2291.2021.11.017 [2] 田有文, 林磊, 宋士媛, 等. 基于DIQA的腐烂蓝莓高光谱特征波长图像选取方法[J]. 沈阳农业大学学报, 2022, 53(2): 187-195.TIAN Y W, LIN L, SONG S Y, et al. Hyperspectral characteristic wavelength image selection method of decayed blueberry based on DIQA[J]. Journal of Shenyang Agricultural University, 2022, 53(2): 187-195. [3] 郭昊蔚, 李春霖, 龚淑英, 等. 光谱技术在茶叶理化指标检测中的研究进展[J]. 茶叶,2019,45(1):9−12. [GUO H W, LI C L, GONG S Y, et al. Research progress of spectroscopy in the detection of tea physical and chemical characteristics[J]. Journal of Tea,2019,45(1):9−12. doi: 10.3969/j.issn.0577-8921.2019.01.007 [4] AN T, HUANG W Q, TIAN X, et al. Hyperspectral imaging technology coupled with human sensory information to evaluate the fermentation degree of black tea[J]. Sensors and Actuators:B Chemical,2022,366:131994. doi: 10.1016/j.snb.2022.131994 [5] CHEN Q S, ZHAO J W, CAI J R, et al. Estimation of tea quality level using hyperspectral imaging technology[J]. Acta Optica Sinica,2008,28(4):669−674. doi: 10.3788/AOS20082804.0669 [6] PAUL J C, JENNIFER L D, DAVID L P. Estimating the foliar biochemical concentration of leaves with reflectance spectrometry[J]. Remote Sensing of Environment,2001,76(3):349−359. doi: 10.1016/S0034-4257(01)00182-1 [7] HONG Z, HE Y. Rapid and nondestructive discrimination of geographical origins of longjing tea using hyperspectral imaging at two spectral ranges coupled with machine learning methods[J]. Applied Sciences,2020,10(3):1173−1184. doi: 10.3390/app10031173 [8] 孙耀国, 林敏, 吕进, 等. 近红外光谱法测定绿茶中氨基酸、咖啡碱和茶多酚的含量[J]. 光谱实验室,2004(5):940−943. [SUN Y G, LIN M, LYU J, et al. Determination of the contents of free amino acids, caffeine and tea polyphenols in green tea by fourier transform near-infrared spectroscopy[J]. Chinese Journal of Spectroscopy Laboratory,2004(5):940−943. doi: 10.3969/j.issn.1004-8138.2004.05.033 [9] 陈义, 袁丁, 孙慕芳. 信阳毛尖茶叶感官品质与化学成分的相关性分析[J]. 江苏农业科学,2014,42(11):342−344. [CHEN Y, YUAN D, SUN M F. Correlation analysis between sensory quality and chemical components of Xinyang maojian tea[J]. Jiangsu Agricultural Sciences,2014,42(11):342−344. doi: 10.15889/j.issn.1002-1302.2014.11.122 [10] LI L, PENG Y K, LI Y Y, et al. A new scattering correction method of different spectroscopic analysis for assessing complex mixtures[J]. Analytica Chimica Acta,2019,1087:20−28. doi: 10.1016/j.aca.2019.08.067 [11] XU M X, CHU X Y, FU Y S, et al. Improving the accuracy of soil organic carbon content prediction based on visible and near-infrared spectroscopy and machine learning[J]. Environmental Earth Sciences,2021,80(8):326.1−326.10. [12] 冯蕾, 陈锡芹, 程祖顺, 等. 二阶导数光谱法定量分析凝灰岩石粉对不同侧链长度聚羧酸减水剂吸附性[J]. 光谱学与光谱分析,2019,39(9):2788−2793. [FENG L, CHEN X Q, CHENG Z S, et al. Quantitative analysis for adsorption of polycarboxylate superplasticizer with different side-chain length on tuff powder using second derivative spectrometry[J]. Spectroscopy and Spectral Analysis,2019,39(9):2788−2793. [13] 郭斗斗, 黄绍敏, 张水清, 等. 多种潮土有机质高光谱预测模型的对比分析[J]. 农业工程学报,2014,30(21):192−200. [GUO D D, HUANG S M, ZHANG S Q, et al. Comparative analysis of various hyperspectral prediction models of fluvo-aquic soil organic matter[J]. Transactions of the Chinese Society of Agricultural Engineering,2014,30(21):192−200. doi: 10.3969/j.issn.1002-6819.2014.21.023 [14] 赵强, 张工力, 陈星旦. 多元散射校正对近红外光谱分析定标模型的影响[J]. 光学精密工程,2005(1):53−58. [ZHAO Q, ZHANG G L, CHENG X D. Effects of multiplicative scatter correction on a calibration model of near infrared spectral analysis[J]. Optics and Precision Engineering,2005(1):53−58. doi: 10.3321/j.issn:1004-924X.2005.01.010 [15] 黄凌霞, 吴迪, 金航峰, 等. 基于变量选择的蚕茧茧层量可见-近红外光谱无损检测[J]. 农业工程学报,2010,26(2):231−236. [HUANG L X, WU D, JIN H F, et al. Non-destructive detection of cocoon shell weight based on variable selection by visible and near infrared spectroscopy[J]. Transactions of the Chinese Society of Agricultural Engineering,2010,26(2):231−236. [16] ZOU X B, ZHAO J W, MALCOLM J W, et al. Variables selection methods in near-infrared spectroscopy[J]. Analytica Chimica Acta,2010,667(1):14−32. [17] LU H X, ZHANG J, LI L Q, et al. Least angle regression combined with competitive adaptive re-weighted sampling for NIR spectral wavelength selection[J]. Spectroscopy and Spectral Analysis,2021,41(6):1782−1788. [18] 曲歌, 陈争光, 张庆华. 基于无信息变量消除法的水稻种子发芽率测定[J]. 江苏农业学报,2019,35(5):1015−1020. [QU G, CHEN Z G, ZHANG Q H. Study on germination rate of rice seed based on uninformation variable elimination method[J]. Jiangsu Agricultural Journal,2019,35(5):1015−1020. doi: 10.3969/j.issn.1000-4440.2019.05.002 [19] 彭海根, 金楹, 詹莜国, 等. 近红外光谱技术结合竞争自适应重加权采样变量选择算法快速测定土壤水解性氮含量[J]. 分析测试学报,2020,39(10):1305−1310. [PENG H G, JIN Y, ZHAN X G, et al. Quantitative determination of hydrolytic nitrogen content in soil by infrared spectroscopy combined with adaptive reweighted sampling variable selection algorithm[J]. Journal of Instrumental Analysis,2020,39(10):1305−1310. doi: 10.3969/j.issn.1004-4957.2020.10.019 [20] 赵静远, 熊智新, 宁井铭, 等. 小波变换结合连续投影算法优化茶叶中咖啡碱的近红外分析模型[J]. 分析科学学报,2021,37(5):611−617. [ZHAO J Y, XIONG Z X, NING J M, et al. Wavelet transform combined with spa to optimize the near-infrared analysis model of caffeine in tea[J]. Journal of Analytical Science,2021,37(5):611−617. [21] ANJOS O, CALDEIRA I, FERNANDES T A, et al. PLS-R Calibration models for wine spirit volatile phenols prediction by near-infrared spectroscopy[J]. Sensors,2021,22(1):286. doi: 10.3390/s22010286 [22] LI D Y, MA Z M. Residual attention learning network and svm for malaria parasite detection[J]. Multimedia Tools and Applications,2022,81(8):10935−10960. doi: 10.1007/s11042-022-12373-6 [23] JEVŠENAK J, SKUDNIK M. A random forest model for basal area increment predictions from national forest inventory data[J]. Forest Ecology and Management,2021,479:118601. doi: 10.1016/j.foreco.2020.118601 [24] UYEH D D, IYIOLA O, MALLIPEDDI R, et al. Grid search for lowest root mean squared error in predicting optimal sensor location in protected cultivation systems[J]. Frontiers in Plant Science,2022,13:920284. doi: 10.3389/fpls.2022.920284 [25] 李丹, 黄钰辉, 孙中宇, 等. 不同树种叶片养分含量提取的高光谱方法及精度评价[J]. 热带地理, 2020, 40(2): 175-183.LI D, HUANG Y H, SUN Z Y, et al. Development and accuracy assessment of a hyperspectral data-based model for leafnutrient content extraction in wetland tree species[J]. Tropical Geography, 2020, 40(2): 175-183. [26] CHEN S Z, GAO Y, FAN K, et al. Prediction of drought-induced components and evaluation of drought damage of tea plants based on hyperspectral imaging[J]. Frontiers in Plant Science,2021,12:695102. doi: 10.3389/fpls.2021.695102 [27] 刘建军, 陈义, 郭桂义, 等. 不同摊放时间和杀青温度对夏季绿茶品质的影响[J]. 河南农业科学,2011,40(5):74−76. [LIU J J, CHENG Y, GUO G Y, et al. Effects of laying time and de-enzyming on the quality of summer green tea[J]. Journal of Henan Agricultural Sciences,2011,40(5):74−76. doi: 10.3969/j.issn.1004-3268.2011.05.018 [28] 宛晓春. 茶叶生物化学[M]. 第3版. 北京: 中国农业出版社, 2003: 108−110.WAN X C. Biochemistry of tea[M]. 3th ed. Beijing: China Agricultural Press, 2003: 108−110. [29] 李晓丽, 魏玉震, 徐劼, 等. 基于高光谱成像的茶叶中EGCG分布可视化[J]. 农业工程学报,2018,34(7):180−186. [LI X L, WEI Y Z, XU J, et al. EGCG distribution visualization in tea leaves based on hyperspectral imaging technology[J]. Transactions of the Chinese Society of Agricultural Engineering,2018,34(7):180−186. [30] 夏小欢, 陈旭东, 付杰, 等. 揉捻压力大小对香茶生化成分与感官品质的影响[J]. 现代农业科技,2021(23):186−187, 190. [XIA X H, CHEN X D, FU J, et al. Effect of rolling pressure on biochemical components and sensory quality of fragrant tea[J]. Modern Agricultural Sciences and Technology,2021(23):186−187, 190. [31] 宛晓春. 红、绿茶干燥过程的热化学变化[J]. 茶叶科学,1988(2):47−52. [WAN X C. Thermochemical changes of red and green tea during drying[J]. Journal of Tea Science,1988(2):47−52. [32] MAO Y, LI H, WANG Y, FAN K, et al. Prediction of tea polyphenols, free amino acids and caffeine content in tea leaves during wilting and fermentation using hyperspectral imaging[J]. Foods,2022,11(16):2537. doi: 10.3390/foods11162537 [33] 于雷, 洪永胜, 周勇, 等. 高光谱估算土壤有机质含量的波长变量筛选方法[J]. 农业工程学报,2016,32(13):95−102. [YU L, HONG Y S, ZHOU Y, et al. Wavelength variable selection methods for estimation of soil organic matter content using hyperspectral technique[J]. Transactions of the Chinese Society of Agricultural Engineering,2016,32(13):95−102. [34] 吴伟斌, 刘文超, 李泽艺, 等. 基于高光谱的茶叶含水量检测模型建立与试验研究[J]. 河南农业大学学报,2018,52(5):818−824. [WU W B, LIU W C, LI Z Y, et al. Study on detection model establishment and experiment of tea water content based on hyperspectral[J]. Journal of Henan Agricultural University,2018,52(5):818−824. [35] 陈雅君. 影响茶叶品质的主要生化指标的高光谱反演研究[D]. 福州: 福建师范大学, 2017.CHENG Y J. Study on hyperspectral inversion of main biochemical indexes affecting tea quality[D]. Fuzhou: Fujian Normal University, 2017. [36] 白晓丽, 郭卫华, 孔俊豪, 等. 速溶普洱茶中水分、咖啡碱和茶多酚含量近红外光谱快速测定方法的建立[J]. 食品工业科技,2019,40(1):234−238,245. [BAI X L, GUO W H, KONG J H, et al. Establishment of a method for the rapid measurement of moisture, caffeine and tea-polyphenols in instant pu'er tea by near infrared spectroscopy[J]. Science and Technology of Food Industry,2019,40(1):234−238,245.