初始化卷积神经网络的主成分洗牌方法

李玉鑑; 沈成恺; 杨红丽; 胡海鹤

doi:10.11936/bjutxb2016060070

初始化卷积神经网络的主成分洗牌方法

doi: 10.11936/bjutxb2016060070

北京工业大学计算机学院, 北京 100124

基金项目: 国家自然科学基金资助项目(61175004)；高等学校博士学科点专项科研基金资助项目(20121103110029)；中国博士后科学基金资助项目(2015M580952)

详细信息

作者简介:
作者简介: 李玉鑑(1968—), 男, 教授, 主要从事模式识别、图像处理、机器学习、数据挖掘方面的研究, E-mail:liyujian@bjut.edu.cn

中图分类号: TP391.4
计量
- 文章访问数: 236
- HTML全文浏览量: 129
- PDF下载量: 0
- 被引次数: 0
出版历程
- 收稿日期: 2016-06-23
- 网络出版日期: 2022-09-09
- 刊出日期: 2017-01-01

PCA Shuffling Initialization of Convolutional Neural Networks

College of Computer Science, Beijing University of Technology, Beijing 100124, China

摘要

摘要: 为了更好地初始化卷积神经网络,提出了一种初始化卷积核的有效方法,称为主成分洗牌方法. 该方法首先对第1个卷积层的每个输入特征图的所有感受野进行采样,再对采样得到的图像块按输入特征图分别进行主成分分析,利用主成分分析得到的投影矩阵初始化该层卷积核,最后按上述过程依次对各层卷积核进行初始化. 使用该方法在MNIST与CIFAR-10数据集上进行卷积层初始化实验. 实验结果表明:与目前常用的随机初始化算法、Xavier初始化算法相比,该方法在提高网络的训练速度和测试集正确率方面均具有优越性.
- 卷积神经网络 /
- 初始化 /
- 主成分分析
Abstract: To initialize convolutional neural networks better, an effective method named principal component analysis (PCA) Shuffling initialization was proposed. The method consisted of three steps. First, for the first convolutional layer, all receptive field of each feature map on training set was sampled. Then, principal component analysis of image patches separately for each feature map was conducted, and projection matrix was used to initialize filter of first convolutional layer. Finally, the first two steps on the other convolutional layers layer-wisely were performed. Experimental results on MNIST and CIFAR-10 dataset show that the proposed initialization has advantages of accuracy and speed of convergence compared to the common method such as random initialization and Xavier initialization.
- convolutional neural network /
- initialization /
- principal component analysis (PCA)
The authors have declared that no competing interests exist.

HTML全文

图 1 卷积神经网络示例

Figure 1. Convolutional neural network

下载: 全尺寸图片幻灯片

图 2 卷积层结构

Figure 2. Convolutional layer

下载: 全尺寸图片幻灯片

图 3 在训练样本集上对特征图采样

Figure 3. Sample image patches from feature map on training set

下载: 全尺寸图片幻灯片

图 4 使用主成分初始化卷积核

Figure 4. Initialize convolutional kernel by PCA component

下载: 全尺寸图片幻灯片

图 5 洗牌打乱示例

Figure 5. Example of shuffling convolutional kernel

下载: 全尺寸图片幻灯片

图 6 MNIST数据集上使用算法1初始化不同层的实验结果

Figure 6. Results of initializing different layer on MNIST

下载: 全尺寸图片幻灯片

图 7 MNIST数据集上不同初始化方法效果

Figure 7. Results of different initializing method on MNIST

下载: 全尺寸图片幻灯片

图 8 CIFAR-10数据集上使用算法1初始化不同层实验结果

Figure 8. Results of initializing different layer on CIFAR-10

下载: 全尺寸图片幻灯片

图 9 CIFAR-10数据集上不同初始化方法效果

Figure 9. Results of different initializing method on CIFAR-10

下载: 全尺寸图片幻灯片

参考文献(11)

[1]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2): 2012.
[2]	THIMM G, FIESLER E.Neural network initialization[C]//International Workshop on Artificial Neural Networks: From Natural To Artificial Neural Computation. Berlin: Springer-Verlag, 1995: 535-542.
[3]	GLOROT X, BENGIO Y.Understanding the difficulty of training deep feedforward neural networks[J]. Journal of Machine Learning Research, 2010, 9: 249-256.
[4]	BENGIO Y.Practical recommendations for gradient-based training of deep architectures[J]. Journal of Non-Crystalline Solids, 2012, 71(1/2/3): 133-144.
[5]	ABDI H, WILLIAMS L J.Principal component analysis[J]. Wiley Interdisciplinary Reviews Computational Statistics, 2010, 2(4): 433-459.
[6]	SHLENS J.A tutorial on principal component analysis[J]. Eprint Arxiv, 2014, 58(3): 219-226.
[7]	JIA Y, SHELHAMER E, DONAHUE J, et al.Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia. New York: ACM, 2014: 675-678.
[8]	LECUN Y.MNIST[DS/OL]. [2016-06-23].http:∥yann. lecun.com/exdb/mnist/.
[9]	LECUNYBOTTOULBENGIOY.Granient-based learning applied to document recognitionProceedings of IEEE1988861122782324 LECUN Y, BOTTOU L, BENGIO Y.Granient-based learning applied to document recognition[J]. Proceedings of IEEE, 1988, 86(11): 2278-2324.
[10]	KRIZHEVSKY A.Learning multiple layers of features from tiny images[D]. MSc thesis, Toronto: University of Toronto, 2009.
[11]	KRIZHEVSKY A.Cuda-convnet[CP/OL]. [2016-06-23]. https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-80sec.cfg. 2012. https://code.google.com/p/cuda-convnet/source/browse/trunk/example-layers/layers-80sec.cfg