An Anthocyanin Prediction Model of Blueberry Pomace Based on Stacked Supervised Autoencoders
-
摘要: 基于可见近红外光谱技术,采用深度学习中的堆叠监督自编码器(stacked supervised autoencoders,SSAE)对蓝莓果渣的花青素含量进行了建模。首先对光谱数据进行预处理和特征筛选处理,以预设SSAE模型的预测集均方根误差(RMSEP)最低为标准,选择出178个特征波长;以选择出的特征波长处的吸光值作为SSAE模型的输入,以蓝莓果渣中的花青素含量为输出,讨论SSAE模型激活参数、节点数、训练次数和学习率,得到SSAE最优参数,即激活函数rule、结构178-60-5-1、训练次数70、学习率0.01。选取训练集均方根误差(RMSEC)、预测集均方根误差(RMSEP)、预测集相关系数(Rp)为评价标准,获得所建立模型的RMSEC、RMSEP、Rp分别为1.0500、0.3835、0.9042。最后通过与经典回归预测模型极限学习机(extreme learning machine,ELM)、最小二乘支持向量机回归(least squares support vector regression,LSSVR)和偏最小二乘回归(partial least squares regression,PLSR)算法进行对比,发现本研究所建SSAE模型的预测精度更高,表明SSAE模型与可见近红外光谱结合能有效预测蓝莓果渣中的花青素含量。Abstract: Based on the visible and near-infrared reflectance spectroscopy technique, stacked supervised autoencoders (SSAE) in deep learning were used to model the anthocyanin content of blueberry pomace. First, preprocessing and feature screening for spectral data were performed. With the minimum value of prediction set root mean square error (RMSEP) of the preset SSAE model as the standard, 178 characteristic wavelengths were selected. The absorbance of the selected characteristic wavelength was used as the input to the SSAE model. The anthocyanin content of blueberry pomace was used as the output. By exploring the activation parameters, node number, training times and learning rate of the SSAE model, the optimal parameters of SSAE were obtained, namely, the activation function of rule, the structure of 178-60-5-1, the training times of 70, and the learning rate of 0.01. The training set root mean square error (RMSEC), prediction set root mean square error (RMSEP), and prediction set correlation coefficient (Rp) were selected as the evaluation criteria. The RMSEC, RMSEP, and Rp of the established model were 1.0500, 0.3835, and 0.9042, respectively. Compared with the classic regression prediction model extreme learning machine (ELM), least squares support vector regression (LSSVR) and partial least squares regression (PLSR) algorithm, the prediction accuracy of the SSAE model was higher. Therefore, the combination of the SSAE model with visible and near-infrared reflectance spectroscopy proved to be effective in predicting anthocyanin content of blueberry pomace.
-
表 1 不同预处理和特征选择方式下SSAE模型结果
Table 1. SSAE model results under different preprocessing and feature selection methods
预处理方式 特征选择方式 RMSEC RMSEP Rp SG CARS 4.6994 0.6466 0.6940 Pearson 1.1321 0.4435 0.8696 不使用降维 1.8665 0.4302 0.8778 MSC CARS 1.5124 0.6942 0.6344 Pearson 1.7429 0.7063 0.6177 不使用降维 1.7991 0.6540 0.6854 SNV CARS 5.1595 0.6002 0.7439 Pearson 5.2719 0.6342 0.7081 不使用降维 5.8429 0.6692 0.6669 1st-D CARS 1.4947 0.8981 0.0088 Pearson 0.9399 0.8980 0.0148 不使用降维 1.1199 0.8982 0.0088 DT CARS 1.4034 0.7954 0.4645 Pearson 1.8946 0.6792 0.6543 不使用降维 1.6619 0.5357 0.8027 源数据 CARS 1.5709 0.4717 0.8510 Pearson 1.1095 0.4154 0.8866 不使用降维 2.0287 0.4267 0.8799 表 2 不同激活函数下SSAE模型结果
Table 2. SSAE model results under different activation functions
激活函数 RMSEC RMSEP Rp tanh 1.0191 0.4814 0.8442 rule 1.1095 0.4154 0.8866 未使用激活函数 0.8755 0.4493 0.8659 表 3 不同神经元配置的SSAE建模结果
Table 3. SSAE modeling results for different neuron configurations
神经元配置 RMSEC RMSEP Rp (90,30) 0.9451 0.4155 0.8866 (90,15) 1.0142 0.3929 0.8992 (90,10) 1.0493 0.3852 0.9034 (60,30) 0.9890 0.4240 0.8816 (60,15) 1.0980 0.4107 0.8893 (60,10) 1.1207 0.3915 0.9000 (60,5) 1.0500 0.3835 0.9042 (30,15) 1.1805 0.4162 0.8861 (30,10) 1.2605 0.4403 0.8716 (30,5) 1.1095 0.4154 0.8866 表 4 不同模型的建模预测结果
Table 4. Modeling prediction results of different models
模型 输入模型的数据维数 RMSEP Rp O+Pearson+SSAE 178 0.3835 0.9042 SG+O+ELM 2000 0.4025 0.8916 O+Pearson+LSSVR 178 0.4479 0.8746 1st-D+CARS+PLSR 136 0.4012 0.8939 注:表中的O代表不进行预处理或特征筛选。 -
[1] 高明明, 肖月欢, 王幸, 等. 我国蓝莓食品加工现状分析[J]. 保鲜与加工,2017,17(3):111−117. [GAO Mingming, XIAO Yuehuan, WANG Xing, et al. Analysis on the status quo of blueberry food processing in my country[J]. Storage and Process,2017,17(3):111−117. doi: 10.3969/j.issn.1009-6221.2017.03.021 [2] 张昌容, 李志, 何永福, 等. 蓝莓果渣主要功能性成分及综合利用研究进展[J]. 食品科技,2021,46(6):110−111. [ZHANG Changrong, LI Zhi, HE Yongzhi, et al. Research progress on main functional components and comprehensive utilization of blueberry pomace[J]. Food Science and Technology,2021,46(6):110−111. doi: 10.13684/j.cnki.spkj.2021.06.019 [3] 韩鹏祥, 张蓓, 冯叙桥, 等. 蓝莓的营养保健功能及其开发利用[J]. 食品工业科技,2015,36(6):370−375,379. [HAN Pengxiang, ZHANG Pei, FENG Xuqiao, et al. Nutrition and health care function of blueberry and its development and utilization[J]. Science and Technology of Food Industry,2015,36(6):370−375,379. [4] 雷良波, 杨浩, 陈军李, 等. 蓝莓果渣开发利用研究进展[J]. 中国酿造,2017,36(10):17−22. [LEI Liangbo, YANG Hao, CHEN Junjie, et al. Research progress on development and utilization of blueberry pomace[J]. China Brewing,2017,36(10):17−22. doi: 10.11882/j.issn.0254-5071.2017.10.005 [5] JIE D F, XIE L J, FU X P, et al. Variable selection for partial least squares analysis of soluble solids content in watermelon using near-infrared diffuse transmission technique[J]. Journal of Food Engineering,2013,118(4):387−392. doi: 10.1016/j.jfoodeng.2013.04.027 [6] JUAN F, TERESA G, JAVIER T, et al. Assessment of amino acids and total soluble solids in intact grape berries using contactless Vis and NIR spectroscopy during ripening[J]. Talanta,2019,199:244−253. doi: 10.1016/j.talanta.2019.02.037 [7] 彭发, 王震, 刘双喜, 等. 基于偏最小二乘法和深度学习的近红外糖度预测[J]. 吉林农业大学学报,2021,43(2):196−204. [PENG Fa, WANG Zhen, LIU Shuangxi, et al. Near-infrared sugar content prediction based on partial least squares and deep learning[J]. Journal of Jilin Agricultural University,2021,43(2):196−204. doi: 10.13327/j.jjlau.2021.6116 [8] 张娟, 原帅, 张骏. 基于小波变换-遗传算法-偏最小二乘的草莓糖度检测研究[J]. 分析科学学报,2020,36(1):111−115. [ZHANG Juan, YUAN Shuai, ZHANG Jun. Research on brix detection of strawberry based on wavelet transform-genetic algorithm-partial least square[J]. Journal of Analytical Science,2020,36(1):111−115. [9] ALI M T, ABBAS A, NILOOFAR L N. Prediction of kiwifruit firmness using fruit mineral nutrient concentration by artificial neural network (ANN) and multiple linear regressions (MLR)[J]. Journal of Integrative Agriculture,2017,16(7):1634−1644. doi: 10.1016/S2095-3119(16)61546-0 [10] 刘小路, 薛璐, 鲁晓翔, 等. 近红外光谱技术快速无损检测蓝莓总黄酮、花青素的研究[J]. 食品工业科技,2015,36(16):58−61, 67. [LIU Xiaolu, XUE Lu, LU Xiaoxiang, et al. Research on rapid non-destructive detection of total flavonoids and anthocyanins in blueberry by near-infrared spectroscopy[J]. Science and Technology of Food Industry,2015,36(16):58−61, 67. [11] ZHENG W, BAI Y H, LUO H, et al. Self-adaptive models for predicting soluble solid content of blueberries with biological variability by using near-infrared spectroscopy and chemometrics[J]. Postharvest Biology and Technology,2020,169:111286. doi: 10.1016/j.postharvbio.2020.111286 [12] 薛璐, 刘小路, 鲁晓翔, 等. 近红外漫反射无损检测蓝莓硬度的研究[J]. 浙江农业学报,2015,27(9):1646−1651. [XUE Lu, LIU Xiaolu, LU Xiaoxiang, et al. Non-destructive testing of blueberry firmness by near-infrared diffuse reflectance[J]. Acta Agriculture Zhejiangensis,2015,27(9):1646−1651. doi: 10.3969/j.issn.1004-1524.2015.09.25 [13] 张丽娟, 夏其乐, 陈剑兵, 等. 近红外光谱的三种蓝莓果渣花色苷含量测定[J]. 光谱学与光谱分析,2020,40(7):2246−2252. [ZHANG Lijuan, XIA Qile, CHEN Jianbing, et al. Determination of anthocyanins in three kinds of blueberry pomace by near-infrared spectroscopy[J]. Spectroscopy and Spectral Analysis,2020,40(7):2246−2252. [14] ANDREAS K, FRANCESC X, PRENAFETA-BOLDU. Deep learning in agriculture: A survey[J]. Computers and Electronics in Agriculture,2018,147:70−90. doi: 10.1016/j.compag.2018.02.016 [15] 王璨, 武新慧, 李恋卿, 等. 卷积神经网络用于近红外光谱预测土壤含水率[J]. 光谱学与光谱分析,2018,38(1):36−41. [WANG Can, WU Xinhui, Li Lianqin, et al. Convolutional neural networks for predicting soil moisture content by near infrared spectroscopy[J]. Spectroscopy and Spectral Analysis,2018,38(1):36−41. [16] LIU J, ZHANG J X, TAN Z L, et al. Detecting the content of the bright blue pigment in cream based on deep learning and near-infrared spectroscopy[J]. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy,2022,270:120757. doi: 10.1016/j.saa.2021.120757 [17] DONG X, QUOCHUY V, BATUAN L. Salt content in saline-alkali soil detection using visible-near infrared spectroscopy and a 2D deep learning[J]. Microchemical Journal,2021,165:106182. doi: 10.1016/j.microc.2021.106182 [18] 孙志兴, 赵忠盖, 刘飞. 堆叠监督自动编码器的近红外光谱建模[J]. 光谱学与光谱分析,2022,42(3):749−756. [SUN Zhixing, ZHAO Zhonggai, LIU Fei. Near-infrared spectral modeling of stacked supervised autoencoders[J]. Spectroscopy and Spectral Analysis,2022,42(3):749−756. doi: 10.3964/j.issn.1000-0593(2022)03-0749-08 [19] LI L, ANDREW P, MARTHA W. Supervised autoencoders: Improving generalization performance with unsupervised regularizers[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS'18), Red Hook, NY, USA: Curran Associates Inc. 2018: 107–117. [20] 赵尔丰, 高畅, 高欣, 等. 酶-超声波辅助提取蓝莓果渣中花青素的工艺研究[J]. 东北农业大学学报,2010,41(4):98−103. [ZHAO Erfeng, GAO Chang, GAO Xin, et al. Study on the technology of enzyme-ultrasonic-assisted extraction of anthocyanins from blueberry pomace[J]. Journal of Northeast Agricultural University,2010,41(4):98−103. doi: 10.3969/j.issn.1005-9369.2010.04.021 [21] 刘仁道, 张猛, 李新贤. 草莓和蓝莓果实花青素提取及定量方法的比较[J]. 园艺学报,2008(5):655−660. [LIU Rendao, ZHANG Meng, LI Xinxian. Comparison of extraction and quantitative methods of anthocyanins from strawberry and blueberry fruits[J]. Acta Horticulturae Sinica,2008(5):655−660. doi: 10.16420/j.issn.0513-353x.2008.05.013 [22] 第五鹏瑶, 卞希慧, 王姿方, 等. 光谱预处理方法选择研究[J]. 光谱学与光谱分析,2019,39(9):2800−2806. [DI Wupengyao, BIAN Xihui, WANG Zifang, et al. Study on the selection of spectral preprocessing method[J]. Spectroscopy and Spectral Analysis,2019,39(9):2800−2806. [23] 张建勇, 高冉, 胡骏, 等. 灰色关联度和Pearson相关系数的应用比较[J]. 赤峰学院学报(自然科学版),2014,30(21):1−2. [ZHANG Jianyong, GAO Ran, HU Jun, et al. Application comparison of grey correlation degree and Pearson correlation coefficient[J]. Journal of Chifeng University (Natural Science Edition),2014,30(21):1−2. doi: 10.3969/j.issn.1673-260X.2014.21.001 [24] 罗一甲, 祝赫, 李潇涵, 等. 赤霞珠酿酒葡萄总酚含量的近红外光谱定量分析[J]. 光谱学与光谱分析,2021,41(7):2036−2042. [LUO Yijia, ZHU He, LI Xiaohan, et al. Quantitative analysis of total phenolic content in Cabernet Sauvignon wine grapes by near-infrared spectroscopy[J]. Spectroscopy and Spectral Analysis,2021,41(7):2036−2042. [25] LIN C, CHEN X, JIAN L, et al. Determination of grain protein content by near-infrared spectrometry and multivariate calibration in barley[J]. Food Chemistry,2014,162:10−15. doi: 10.1016/j.foodchem.2014.04.056