Machine Translation of Mongolianand Chinese Natural Language Based on Statistical Analysis
-
摘要: 为改变内蒙古地区蒙汉机器翻译发展相对落后的现状,采用基于统计的机器翻译方法将短语作为翻译的最基本单元,并基于最大熵模型提出了一种分词方法和词对齐方法,通过调序结果来输出译文. 实验结果表明:改进后的翻译系统BLEU值在一定程度上有所提高,所提方法可为蒙汉应用研究提供参考.Abstract: In order to change the current situation of the development of Mongolian and Chinese Machine Translation in Inner Mongolia. A machine translation method was presented based on statistics by using phrase as the basic unit of translation. A maximum entropy model was selected, methods of participle and word alignment were provided. The translation was output through the results of adjustment. Results show that the BLEU value obtained by the method is improved. The method proposed in this paper can provide reference for the application study of Mongolian and Chinese.
-
表 1 短语抽取表
Table 1. Phrase extraction table
表 2 语料库分布情况
Table 2. Distribution of corpus%
领域 所占比例 日常用语和短语对话 45.67 政府和法律文献 20.00 文学领域 34.33 表 3 不考虑短语长度的情况
Table 3. Case without the consideration of phrase length
依存值阈值 短语翻译概率/% BLEU 0.0 100 0.2009 0.6 88 0.2098 0.8 86 0.2177 1.5 81 0.2213 1.7 79 0.2273 2.1 73 0.2265 3.2 52 0.2010 表 4 考虑短语长度的情况
Table 4. Case with the consideration of phrase length
短语长度 2 3 4 5 6 7 依存值阈值 1.3 1.5 4.8 3.1 6.9 13.9 BLEU 0.2189 0.2208 0.2120 0.2010 0.2190 0.2048 -
[1] DAI W C.Machine translation development status at home and abroad[J]. Software and Information Serive, 1994(12): 2-4. (in Chinese) [2] LU W L.Machine translation development overview[J]. Journal of Library and Information Sciences in Agriculture, 2002(4): 24-25. (in Chinese) [3] LI R Z, WU J C.The rise of machine translation in Western countries[J]. Shanghai Journal of Translators,1992(4): 37-42. (in Chinese) [4] LIU S J, LI Z H, LI M, et al.Co-training framework for feature weight optimization of statistic machine translation[J]. Journal of Software, 2012, 23(12): 3101-3114. (in Chinese) [5] HE Y Q, ZHANG J S, WANG H L, et al.Combining multiple translations based on words and phrase[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(12): 1268-1273. (in Chinese) [6] LIU Q.Survey on statistical machine translation[J]. Journal of Chinese Information Processing, 2003, 17(4): 1-12. (in Chinese) [7] YANG M Y, LI S, ZHAO T J, et al.Research on bilingual annotation for Chinese and English[J]. Journal of the China Society for Scientific and Technical Information, 2000, 19(5): 464. (in Chinese) [8] ZHANG Y G.Public space politics in constructing unification of multinational states country[J]. Journal of Beijing Normal University (Social Sciences), 2014(6): 58-64. (in Chinese) [9] LI Q, SUN K J, LIU Z, et al.Technical analysis of NiuTrans: open source statistical machine translation system[J]. Programmer, 2012(8): 52-55. [10] WU D M Q E, WANG S R G L. Research on numerals automatic translation of Mongolian-Chinese machine translation[J]. Journal of Inner Mongolia Normal University (Natural Science Edition), 2015, 44(3): 368-371. (in Chinese) [11] HU L Y.The enlightenment of machine translation theory on Chinese and Russian artificial translation[J]. Foreign Languages Research, 2013(3): 82-86. (in Chinese) [12] YANG P, ZHANG J, LI Miao, et al.Morphology-processing in Chinese Mongolian statistical machine translation[J]. Journal of Chinese Information Processing, 2009, 23(1): 50-57. (in Chinese) [13] ZHANG R Y, SHI X D, CHEN Y D.Analysis and improvement to IRST language modeling toolkit[J]. Mind and Computation, 2008(1): 8-15. (in Chinese) [14] HE Z J, LIU Q, LIN S X, et al.A phrase similarity-based model for statistical machine translation[J]. Chinese High Technology Letters, 2009, 19(4): 337-341. (in Chinese) [15] QU Y N.Recursive column search decoding algorithm and its application[D]. Beijing: Beijing University of Technology, 2008. (in Chinese) [16] OCH J F, NEY H.Discriminative training and maximum entropy models for statistical machine translation[J]. Machine Learning Philadelphia, 2002, 3(2): 295-302. [17] ZHANG J Y.Research on translation model rearrangement based on hierarchical phrases[D]. Shanghai: Shanghai Jiaotong University, 2015. (in Chinese) [18] NA B Q.The research of Mongolian and Chinese machine translation system based on statistics[J]. Journal of Inner Mongolia Agricultural University(Natural Science Edition), 2005, 26(4): 151-154. (in Chinese) [19] OCH J F, NEY H.Discriminative training and maximum entropy models for statistical machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). Philadelphia: Association for Computational Linguistics, 2002: 295-302. [20] WANG J Q, GAO J T.BLEU translation evaluation method based on examples[J]. Computer Knowledge and Technology, 2009, 32(5): 9035-9036. (in Chinese) [21] YE Y, ZHOU M, LIN C Y.Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU[C]//Proceedings of the Second Workshop on Statistical Machine Translation. Prague: Association for Computational Linguistics, 2007: 240-247. [22] LÜ D.Diverse language high quality translation[J]. Chinese Translation, 2010(4): 19. (in Chinese) [23] GAO Y.Brief discussion on Russian and Chinese word order contrast[J]. Theory Observation, 2006(5): 128-129. (in Chinese) [24] CHEN Y, LÜ Y J, LI S.Research of collocation translation model based on multi features[J]. Journal of Harbin Institute of Technology, 2007, 39(11): 1790-1795. (in Chinese)