深层感知器结构设计的逐层主成分分析方法

李玉鑑; 杨红丽; 刘兆英

doi:10.11936/bjutxb2016040024

深层感知器结构设计的逐层主成分分析方法

doi: 10.11936/bjutxb2016040024

北京工业大学计算机学院, 北京 100124

基金项目: 国家自然科学基金资助项目(61175004)；高等学校博士学科点专项科研资助项目(20121103110029)；中国博士后科学基金资助项目(2015M580952)

详细信息

作者简介:
作者简介: 李玉鑑(1968—), 男, 教授, 主要从事模式识别、图像处理、机器学习、数据挖掘方面的研究, E-mail:liyujian@bjut.edu.cn

中图分类号: TP391
计量
- 文章访问数: 192
- HTML全文浏览量: 58
- PDF下载量: 0
- 被引次数: 0
出版历程
- 收稿日期: 2016-04-08
- 网络出版日期: 2022-09-13
- 刊出日期: 2017-02-01

Deep Perception Structure Design Via Layer-wise Principal Component Analysis

College of Computer Science, Beijing University of Technology, Beijing 100124, China

摘要

摘要: 为了解决深层感知器的结构设计问题,提出了一种逐层主成分分析方法. 该方法根据训练数据集的分布特点,在适当控制信息损失的条件下,可以有效地确定每层神经元的个数. 首先,依据样本维数和标签类数分别确定输入层和输出层神经元的个数;然后,对训练样本集进行主成分分析,利用降维后的维数确定第2层神经元的个数;最后,在确定其他层神经元的个数时,将上一次降维后的样本经过非线性激活函数作用,再进行主成分分析,得到降维后的样本维数即为该层神经元的个数. 在MNIST手写字数据集上的实验结果表明:该方法有助于简化深层感知器的结构,在减少参数个数、缩短收敛时间和降低训练难度等方面均具有优越性.
- 深层感知器 /
- 结构设计 /
- 神经元个数 /
- 主成分分析
Abstract: To design a deep perception structure, an effective method was presented in this paper. By appropriately controlling information loss of training data, the number of neurons in each layer of a deep perception was adaptively determined by layer-wise principal component analysis (LPCA). At first, the number of input neurons and output neurons were taken as the training data dimension and the number of class labels respectively. Then, the number of neurons in the second layer was computed as a principal component analysis (PCA) dimension from the training data. Finally, the number of neurons in a layer between the second and the output layer were repeatedly computed from the activations of neurons in its previous layer followed by a PCA. Experimental results show that this LPCA method has superior performances in deep perception structure designing, such as simplifying the structure of deep perception, decreasing number of parameters, accelerating process of training, saving time for convergence. The idea of LPCA provides a new reference for designing deep perceptions and for applications.
- deep perception /
- structure design /
- the number of neurons /
- principal component analysis (PCA)
The authors have declared that no competing interests exist.

HTML全文

图 1 深层感知器结构图

Figure 1. Deep perception structure

下载: 全尺寸图片幻灯片

图 2 GLPCA确定的不同层数网络结构及相应的训练和测试错误率

Figure 2. Training and test error of networks with various structures designed by GLPCA

下载: 全尺寸图片幻灯片

表 1 对比实验数据和结果

Table 1. Data and results of comparison experiments

实验	Hinton实验	逐层主成分分析实验
网络层数	5	5	6
网络结构	784-500-500-2000-10	784-388-352-325-10	784-388-352-325-302-10
神经元总个数	3794	1859	2161
参数个数	1.67×10⁶	5.59×10⁵	6.58×10⁵
收敛时间/h(相同机器训练)	10.218	2.121	2.300
测试集错误率/%	1.20^①、1.14^②	1.15	1.09
①在文献[1]中,Hiton实验的测试集错误率达到1.20%;②在文献[45]中,微调时通过对整个网络使用共轭梯度下降算法使错误率达到1.14%.

下载: 导出CSV

表 2 实验相关数据及结果

Table 2. Experiment data and results

网络层数	深层感知器结构	训练错误率达到0		训练网络收敛		测试错误率/%
网络层数	深层感知器结构	迭代次数	时间/h	迭代次数	时间/h	网络收敛	迭代过程中最低
3	784-388-10	32	0.497	129	2.514	1.58	1.51
4	784-388-352-10	39	0.538	105	2.132	1.39	1.29
5	784-388-352-325-10	28	0.463	63	2.121	1.15	1.11
6	784-388-352-325-302-10	27	0.724	54	2.300	1.09	1.06
7	784-388-352-325-302-282-10	28	1.074	48	2.415	1.15	1.14
8	784-388-352-325-302-282-264-10	25	2.303	48	4.834	1.19	1.15

下载: 导出CSV

参考文献(47)

[1]	HINTON G E, SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507.
[2]	HINTON G E, OSINDERO S, TEH Y W.A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527-54.
[3]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E.Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2): 2012.
[4]	FARABET C, COUPRIE C, NAJMAN L, et al.Learning hierarchical features for scene labeling[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 35(8): 1915-29.
[5]	HINTON G E, LI D, DONG Y, et al.Deep neural networks for acoustic modeling in speech recognition[J]. IEEE Signal Processing Magazine, 2012, 29(6): 82-97.
[6]	COLLOBERT R, WESTON J, BOTTOU L, et al.Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(1): 2493-2537.
[7]	MIKOLOV T, DEORAS A, POVEY D, et al.Strategies for training large scale neural network language models[C]//2011 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). Washington D C: IEEE, 2011: 196-201.
[8]	MAIRESSE F, YOUNG S.Stochastic language generation in dialogue using factored language models[J]. Computational Linguistics, 2014, 40(4): 763-799.
[9]	SARIKAYA K, HINTON G E, DEORAS A.Application of deep belief networks for natural language understanding[J]. IEEE/ACM Transactions on Audio Speech & Language Processing, 2014, 22(4): 778-784.
[10]	NOBLE W S.What is a support vector machine?[J]. Nature Biotechnology, 2006, 24(12): 1565-1567.
[11]	CHAPELLE O.Training a support vector machine in the primal[J]. Neural Computation, 2007, 19(5): 1155-78.
[12]	SCHAPIRE R E.A brief introduction to boosting[J]. Ijcai, 2010, 14(2): 377-380.
[13]	PHILLIPS S J, ANDERSON R P, SCHAPIRE R E.Maximum entropy modeling of species geographic distributions[J]. Ecological Modelling, 2006, 190(3): 231-259.
[14]	KUSAKUNNIRAN W, WU Q, ZHANG J, et al.Cross-view and multi-view gait recognitions based on view transformation model using multi-layer perceptron[J]. Pattern Recognition Letters, 2012, 33(7): 882-889.
[15]	SUN K, HUANG S H, WONG S H, et al.Design and application of a variable selection method for multilayer perceptron neural network with LASSO[J]. IEEE Transactions on Neural Networks and Learning Systems, 2016, 99: 1-11.
[16]	HINTON G E.Learning multiple layers of representation[J]. Trends in Cognitive Sciences, 2007, 11(11): 428-34.
[17]	SCHÖLKOPF B, PLATT J, HOFMANN T. Greedy layer-wise training of deep networks[J]. Advances in Neural Information Processing Systems, 2007, 19: 153-160.
[18]	HÅSTAD J, GOLDMANN M. On the power of small-depth threshold circuits[J]. Computational Complexity, 1990, 1(2): 610-618
[19]	KOLEN J, KREMER S.Gradiant flow in recurrent nets: the difficulty of learning long-term dependencies[J]. 2003, 28(2): 237-243.
[20]	GEORGE B, MICHAEL G.Feed-forward neural networks: Why network size is so important[J]. IEEE Potentials, 1994, 13(4): 27-31.
[21]	REED R.Pruning algorithms-a survey[J]. IEEE Transactions on Neural Networks, 1993, 4(5): 740-747.
[22]	YANG Z J, SHI Z K.Architecture optimization for neural networks[J]. Computer Engineering and Applications, 2004, 40(25): 52-54.(in Chinese)
[23]	COSTA MABRAGA APDE MENEZES B R. Constructive and pruning methods for neural network designVII Brazilian Symposium on Neural Networks, 2002. Washington D C: IEEE20024954 COSTA M A, BRAGA A P, DE MENEZES B R. Constructive and pruning methods for neural network design[C]//VII Brazilian Symposium on Neural Networks, 2002. Washington D C: IEEE, 2002: 49-54.
[24]	STATHAKIS D.How many hidden layers and nodes?[J]. International Journal of Remote Sensing, 2009, 30(8): 2133-2147.
[25]	MOZER M C, SMOLENSKY P.Skeletonization: a technique for trimming the fat from a network via relevance assessment[C]//Neural Information Processing Systems 1. San Francisco: Morgan Kaufmann Publishers Inc, 1989: 107-115.
[26]	CUN Y L, DENKER J S, SOLLA S A.Optimal brain damage[C]//Advances in neural information processing systems 2. San Francisco: Morgan Kaufmann Publishers Inc, 1990: 598-605.
[27]	MRAZOVA I, REITERMANOVA Z.A new sensitivity-based pruning technique for feed-forward neural networks that improves generalization[C]//International Joint Conference on Neural Networks. Washington D C: IEEE, 2011: 2143-2150.
[28]	HINAMOTO T, HAMANAKA T, MAEKAWA S, et al.Generalizing smoothness constraints from discrete samples[J]. Neural Computation, 2008, 2(2): 188-197.
[29]	HUBERMAN B B, WEIGEND A, RUMELHART D.Back-propagation, weight-elimination and time series prediction[C]//Proc of 1990 Connectionist Models Summer School. San Francisco: Morgan Kaufmann Publishers Inc, 1990: 105-116.
[30]	SIETSMA J, DOW R J F. Creating artificial neural networks that generalize[J]. Neural Networks, 1991, 4(1): 67-79.
[31]	SIETSMA J, DOW R J F. Neural net pruning-why and how[C]//IEEE International Conference on Neural Networks, 1988. Washington D C: IEEE, 1988: 325-333.
[32]	LEUNG F F, LAM H K, LING S H, et al.Tuning of the structure and parameters of a neural network using an improved genetic algorithm[J]. IEEE Transactions on Neural Networks, 2003, 14(1): 79-88.
[33]	YEUNG D S, ZENG X Q.Hidden neuron pruning for multilayer perceptrons using a sensitivity measure[C]//International Conference on Machine Learning and Cybernetics, 2002. Washington D C: IEEE, 2002: 1751-1757.
[34]	FNAIECH F, FNAIECH N, NAJIM M.A new feedforward neural network hidden layer neuron pruning algorithm[C]//IEEE International Conference on Acoustics. Washington D C: IEEE, 2001: 1277-1280.
[35]	PUKRITTAYAKAMEE A, HAGAN M, RAFF L, et al.A network pruning algorithm for combined function and derivative approximation[C]//IEEE-INNS-ENNS International Joint Conference on Neural Networks Neural Networks. Washington D C: IEEE, 2009: 2553-2560.
[36]	SHARMA S K, CHANDRA P.Constructive neural networks: a review[J]. International Journal of Engineering Science & Technology, 2010, 2(12): 7847-7855.
[37]	FREAN M.The upstart algorithm: a method for constructing and training feedforward neural networks[J]. Neural Computation, 2008, 2(2): 198-209.
[38]	FAHLMAN S E, LEBIERE C.The cascade-correlation learning architecture[J]. Advances in Neural Information Processing Systems, 1997, 2(6): 524-532.
[39]	TSOI A C, HAGENBUCHNER M, MICHELI A.Building MLP networks by construction[C]//IEEE-INNS-ENNS International Joint Conference on Neural Networks. Washington D C: IEEE, 2000: 4549-4549.
[40]	ISLAM M M, SATTAR M A, AMIN M F, et al.A new adaptive merging and growing algorithm for designing artificial neural networks[J]. IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics, 2009, 39(3): 705-722.
[41]	SRIDHAR S S, PONNAVAIKKO M.Improved adaptive learning algorithm for constructive neural networks[J]. International Journal of Computer and Electrical Engineering, 2011, 3(1): 30-36.
[42]	FAN J N, WANG Z L, QIAN F.Reseacrh progress structural design of hidden layer in BP artificial neural networks[J].(in Chinese)
[43]	ABDI H, WILLIAMS L J.Principal component analysis[J]. Wiley Interdisciplinary Reviews Computational Statistics, 2010, 2(4): 433-459.
[44]	SHLENS J.A tutorial on principal component analysis[J]. Eprint Arxiv, 2014, 58(3): 219-226.
[45]	YANG J, ZHANG D, FRANGI A F, et al.Two-dimensional PCA: a new approach to appearance-based face representation and recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2004, 26(1): 131-7.
[46]	LÉCUN Y, CORTES C, BURGES C J C. The MNIST database of handwritten digits[EB/OL].[2016-06-10]. http://yann.lecun.com/exdb/mnist. http://yann.lecun.com/exdb/mnist
[47]	HINTON G E, SALAKHUTDINOV R R.Supporting online material for reducing the dimensionality of data with neural networks[J].Science, 2006(28): 313-504.