PCA Shuffling Initialization of Convolutional Neural Networks
-
摘要: 为了更好地初始化卷积神经网络,提出了一种初始化卷积核的有效方法,称为主成分洗牌方法. 该方法首先对第1个卷积层的每个输入特征图的所有感受野进行采样,再对采样得到的图像块按输入特征图分别进行主成分分析,利用主成分分析得到的投影矩阵初始化该层卷积核,最后按上述过程依次对各层卷积核进行初始化. 使用该方法在MNIST与CIFAR-10数据集上进行卷积层初始化实验. 实验结果表明:与目前常用的随机初始化算法、Xavier初始化算法相比,该方法在提高网络的训练速度和测试集正确率方面均具有优越性.Abstract: To initialize convolutional neural networks better, an effective method named principal component analysis (PCA) Shuffling initialization was proposed. The method consisted of three steps. First, for the first convolutional layer, all receptive field of each feature map on training set was sampled. Then, principal component analysis of image patches separately for each feature map was conducted, and projection matrix was used to initialize filter of first convolutional layer. Finally, the first two steps on the other convolutional layers layer-wisely were performed. Experimental results on MNIST and CIFAR-10 dataset show that the proposed initialization has advantages of accuracy and speed of convergence compared to the common method such as random initialization and Xavier initialization.
-
[1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2): 2012. [2] THIMM G, FIESLER E.Neural network initialization[C]//International Workshop on Artificial Neural Networks: From Natural To Artificial Neural Computation. Berlin: Springer-Verlag, 1995: 535-542. [3] GLOROT X, BENGIO Y.Understanding the difficulty of training deep feedforward neural networks[J]. Journal of Machine Learning Research, 2010, 9: 249-256. [4] BENGIO Y.Practical recommendations for gradient-based training of deep architectures[J]. Journal of Non-Crystalline Solids, 2012, 71(1/2/3): 133-144. [5] ABDI H, WILLIAMS L J.Principal component analysis[J]. Wiley Interdisciplinary Reviews Computational Statistics, 2010, 2(4): 433-459. [6] SHLENS J.A tutorial on principal component analysis[J]. Eprint Arxiv, 2014, 58(3): 219-226. [7] JIA Y, SHELHAMER E, DONAHUE J, et al.Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia. New York: ACM, 2014: 675-678. [8] LECUN Y.MNIST[DS/OL]. [2016-06-23].http:∥yann. lecun.com/exdb/mnist/. [9] LECUN Y BOTTOU L BENGIO Y. Granient-based learning applied to document recognition 1988 86 11 2278 2324 LECUN Y, BOTTOU L, BENGIO Y.Granient-based learning applied to document recognition[J]. Proceedings of IEEE, 1988, 86(11): 2278-2324.
[10] KRIZHEVSKY A.Learning multiple layers of features from tiny images[D]. MSc thesis, Toronto: University of Toronto, 2009. [11] KRIZHEVSKY A.Cuda-convnet[CP/OL]. [2016-06-23]. https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-80sec.cfg. 2012. https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-80sec.cfg