Deep Perception Structure Design Via Layer-wise Principal Component Analysis
-
摘要: 为了解决深层感知器的结构设计问题,提出了一种逐层主成分分析方法. 该方法根据训练数据集的分布特点,在适当控制信息损失的条件下,可以有效地确定每层神经元的个数. 首先,依据样本维数和标签类数分别确定输入层和输出层神经元的个数;然后,对训练样本集进行主成分分析,利用降维后的维数确定第2层神经元的个数;最后,在确定其他层神经元的个数时,将上一次降维后的样本经过非线性激活函数作用,再进行主成分分析,得到降维后的样本维数即为该层神经元的个数. 在MNIST手写字数据集上的实验结果表明:该方法有助于简化深层感知器的结构,在减少参数个数、缩短收敛时间和降低训练难度等方面均具有优越性.Abstract: To design a deep perception structure, an effective method was presented in this paper. By appropriately controlling information loss of training data, the number of neurons in each layer of a deep perception was adaptively determined by layer-wise principal component analysis (LPCA). At first, the number of input neurons and output neurons were taken as the training data dimension and the number of class labels respectively. Then, the number of neurons in the second layer was computed as a principal component analysis (PCA) dimension from the training data. Finally, the number of neurons in a layer between the second and the output layer were repeatedly computed from the activations of neurons in its previous layer followed by a PCA. Experimental results show that this LPCA method has superior performances in deep perception structure designing, such as simplifying the structure of deep perception, decreasing number of parameters, accelerating process of training, saving time for convergence. The idea of LPCA provides a new reference for designing deep perceptions and for applications.
-
表 1 对比实验数据和结果
Table 1. Data and results of comparison experiments
实验 Hinton实验 逐层主成分分析实验 网络层数 5 5 6 网络结构 784-500-500-2000-10 784-388-352-325-10 784-388-352-325-302-10 神经元总个数 3794 1859 2161 参数个数 1.67×106 5.59×105 6.58×105 收敛时间/h(相同机器训练) 10.218 2.121 2.300 测试集错误率/% 1.20①、1.14② 1.15 1.09 ①在文献[1]中,Hiton实验的测试集错误率达到1.20%;②在文献[45]中,微调时通过对整个网络使用共轭梯度下降算法使错误率达到1.14%. 表 2 实验相关数据及结果
Table 2. Experiment data and results
网络层数 深层感知器结构 训练错误率达到0 训练网络收敛 测试错误率/% 迭代次数 时间/h 迭代次数 时间/h 网络收敛 迭代过程中最低 3 784-388-10 32 0.497 129 2.514 1.58 1.51 4 784-388-352-10 39 0.538 105 2.132 1.39 1.29 5 784-388-352-325-10 28 0.463 63 2.121 1.15 1.11 6 784-388-352-325-302-10 27 0.724 54 2.300 1.09 1.06 7 784-388-352-325-302-282-10 28 1.074 48 2.415 1.15 1.14 8 784-388-352-325-302-282-264-10 25 2.303 48 4.834 1.19 1.15 -
[1] HINTON G E, SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. [2] HINTON G E, OSINDERO S, TEH Y W.A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527-54. [3] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2): 2012. [4] FARABET C, COUPRIE C, NAJMAN L, et al.Learning hierarchical features for scene labeling[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 35(8): 1915-29. [5] HINTON G E, LI D, DONG Y, et al.Deep neural networks for acoustic modeling in speech recognition[J]. IEEE Signal Processing Magazine, 2012, 29(6): 82-97. [6] COLLOBERT R, WESTON J, BOTTOU L, et al.Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(1): 2493-2537. [7] MIKOLOV T, DEORAS A, POVEY D, et al.Strategies for training large scale neural network language models[C]//2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). Washington D C: IEEE, 2011: 196-201. [8] MAIRESSE F, YOUNG S.Stochastic language generation in dialogue using factored language models[J]. Computational Linguistics, 2014, 40(4): 763-799. [9] SARIKAYA K, HINTON G E, DEORAS A.Application of deep belief networks for natural language understanding[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2014, 22(4): 778-784. [10] NOBLE W S.What is a support vector machine?[J]. Nature Biotechnology, 2006, 24(12): 1565-1567. [11] CHAPELLE O.Training a support vector machine in the primal[J]. Neural Computation, 2007, 19(5): 1155-78. [12] SCHAPIRE R E.A brief introduction to boosting[J]. Ijcai, 2010, 14(2): 377-380. [13] PHILLIPS S J, ANDERSON R P, SCHAPIRE R E.Maximum entropy modeling of species geographic distributions[J]. Ecological Modelling, 2006, 190(3): 231-259. [14] KUSAKUNNIRAN W, WU Q, ZHANG J, et al.Cross-view and multi-view gait recognitions based on view transformation model using multi-layer perceptron[J]. Pattern Recognition Letters, 2012, 33(7): 882-889. [15] SUN K, HUANG S H, WONG S H, et al.Design and application of a variable selection method for multilayer perceptron neural network with LASSO[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 99: 1-11. [16] HINTON G E.Learning multiple layers of representation[J]. Trends in Cognitive Sciences, 2007, 11(11): 428-34. [17] SCHÖLKOPF B, PLATT J, HOFMANN T. Greedy layer-wise training of deep networks[J]. Advances in Neural Information Processing Systems, 2007, 19: 153-160. [18] HÅSTAD J, GOLDMANN M. On the power of small-depth threshold circuits[J]. Computational Complexity, 1990, 1(2): 610-618 [19] KOLEN J, KREMER S.Gradiant flow in recurrent nets: the difficulty of learning long-term dependencies[J]. 2003, 28(2): 237-243. [20] GEORGE B, MICHAEL G.Feed-forward neural networks: Why network size is so important[J]. IEEE Potentials, 1994, 13(4): 27-31. [21] REED R.Pruning algorithms-a survey[J]. IEEE Transactions on Neural Networks, 1993, 4(5): 740-747. [22] YANG Z J, SHI Z K.Architecture optimization for neural networks[J]. Computer Engineering and Applications, 2004, 40(25): 52-54.(in Chinese) [23] COSTA M A BRAGA A P DE MENEZES B R. Constructive and pruning methods for neural network design 2002 49 54 COSTA M A, BRAGA A P, DE MENEZES B R. Constructive and pruning methods for neural network design[C]//VII Brazilian Symposium on Neural Networks, 2002. Washington D C: IEEE, 2002: 49-54.
[24] STATHAKIS D.How many hidden layers and nodes?[J]. International Journal of Remote Sensing, 2009, 30(8): 2133-2147. [25] MOZER M C, SMOLENSKY P.Skeletonization: a technique for trimming the fat from a network via relevance assessment[C]//Neural Information Processing Systems 1. San Francisco: Morgan Kaufmann Publishers Inc, 1989: 107-115. [26] CUN Y L, DENKER J S, SOLLA S A.Optimal brain damage[C]//Advances in neural information processing systems 2. San Francisco: Morgan Kaufmann Publishers Inc, 1990: 598-605. [27] MRAZOVA I, REITERMANOVA Z.A new sensitivity-based pruning technique for feed-forward neural networks that improves generalization[C]//International Joint Conference on Neural Networks. Washington D C: IEEE, 2011: 2143-2150. [28] HINAMOTO T, HAMANAKA T, MAEKAWA S, et al.Generalizing smoothness constraints from discrete samples[J]. Neural Computation, 2008, 2(2): 188-197. [29] HUBERMAN B B, WEIGEND A, RUMELHART D.Back-propagation, weight-elimination and time series prediction[C]//Proc of 1990 Connectionist Models Summer School. San Francisco: Morgan Kaufmann Publishers Inc, 1990: 105-116. [30] SIETSMA J, DOW R J F. Creating artificial neural networks that generalize[J]. Neural Networks, 1991, 4(1): 67-79. [31] SIETSMA J, DOW R J F. Neural net pruning-why and how[C]//IEEE International Conference on Neural Networks, 1988. Washington D C: IEEE, 1988: 325-333. [32] LEUNG F F, LAM H K, LING S H, et al.Tuning of the structure and parameters of a neural network using an improved genetic algorithm[J]. IEEE Transactions on Neural Networks, 2003, 14(1): 79-88. [33] YEUNG D S, ZENG X Q.Hidden neuron pruning for multilayer perceptrons using a sensitivity measure[C]//International Conference on Machine Learning and Cybernetics, 2002. Washington D C: IEEE, 2002: 1751-1757. [34] FNAIECH F, FNAIECH N, NAJIM M.A new feedforward neural network hidden layer neuron pruning algorithm[C]//IEEE International Conference on Acoustics. Washington D C: IEEE, 2001: 1277-1280. [35] PUKRITTAYAKAMEE A, HAGAN M, RAFF L, et al.A network pruning algorithm for combined function and derivative approximation[C]//IEEE-INNS-ENNS International Joint Conference on Neural Networks Neural Networks. Washington D C: IEEE, 2009: 2553-2560. [36] SHARMA S K, CHANDRA P.Constructive neural networks: a review[J]. International Journal of Engineering Science & Technology, 2010, 2(12): 7847-7855. [37] FREAN M.The upstart algorithm: a method for constructing and training feedforward neural networks[J]. Neural Computation, 2008, 2(2): 198-209. [38] FAHLMAN S E, LEBIERE C.The cascade-correlation learning architecture[J]. Advances in Neural Information Processing Systems, 1997, 2(6): 524-532. [39] TSOI A C, HAGENBUCHNER M, MICHELI A.Building MLP networks by construction[C]//IEEE-INNS-ENNS International Joint Conference on Neural Networks. Washington D C: IEEE, 2000: 4549-4549. [40] ISLAM M M, SATTAR M A, AMIN M F, et al.A new adaptive merging and growing algorithm for designing artificial neural networks[J]. IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics, 2009, 39(3): 705-722. [41] SRIDHAR S S, PONNAVAIKKO M.Improved adaptive learning algorithm for constructive neural networks[J]. International Journal of Computer and Electrical Engineering, 2011, 3(1): 30-36. [42] FAN J N, WANG Z L, QIAN F.Reseacrh progress structural design of hidden layer in BP artificial neural networks[J].(in Chinese) [43] ABDI H, WILLIAMS L J.Principal component analysis[J]. Wiley Interdisciplinary Reviews Computational Statistics, 2010, 2(4): 433-459. [44] SHLENS J.A tutorial on principal component analysis[J]. Eprint Arxiv, 2014, 58(3): 219-226. [45] YANG J, ZHANG D, FRANGI A F, et al.Two-dimensional PCA: a new approach to appearance-based face representation and recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2004, 26(1): 131-7. [46] LÉCUN Y, CORTES C, BURGES C J C. The MNIST database of handwritten digits[EB/OL].[2016-06-10]. http://yann.lecun.com/exdb/mnist. http://yann.lecun.com/exdb/mnist [47] HINTON G E, SALAKHUTDINOV R R.Supporting online material for reducing the dimensionality of data with neural networks[J].Science, 2006(28): 313-504.