Model analysis of a system for automatic statistical machine translation
PDF (Hrvatski)

Keywords

automatic machine translation
statistical machine translation model
language technologies
natural language processing
information and communication sciences automatsko strojno prevođenje
model statističkog strojnog prevođenja
jezične tehnologije
računalna obrada prirodnog jezika
informacijske i komunikacijske znanosti

How to Cite

Dunđer, I. (2021). Model analysis of a system for automatic statistical machine translation. Polytechnica, 5(2), 39-47. https://doi.org/10.36978/cte.5.2.4

Abstract

Automatic machine translation is an increasingly popular research topic in science and various scientific disciplines, such as information and communication sciences, computer science, computational linguistics etc. The reason for this is primarily that today it enables unavoidable communication and fast transfer of information between different natural languages. This is especially important for less spoken languages such as Croatian, for which there are still not enough software tools and digital resources that are needed for developing specialized and quality machine translation systems that would be optimized for use in one specific area. The evermore faster growth of data and the growing need of various stakeholders in the sectors of industry, economy, science, but also in peoples’ everyday life imply the motivation for the systematic and organized development and subsequent adaptation of automatic machine translation systems for different language pairs. Since machine translations are not perfect, it is important to apply methods for computationally generating translations of an acceptable level of quality that depends on the task itself and the scope of implementation of the machine translation system. In this paper, the model of a system for automatic statistical machine translation, its components and the role and significance of individual elements within the model are analyzed.  
https://doi.org/10.36978/cte.5.2.4
PDF (Hrvatski)

References

Brkic Bakaric, M., Babic, N., Dajak, L., Manojlovic, M. (2017). A comparative error analysis of English and German MT from and into Croatian. Proceedings of the INFuture2017: Integrating ICT in Society Conference (INFuture 2017) (pp. 31-41).

Brkic Bakaric, M., Tonkovic, K., Nacinovic Prskalo, L. (2020). Clash between Segment-level MT Error Analysis and Selected Lexical Similarity Metrics. International Journal of Advanced Computer Science and Applications, 11 (5), 35–42.

Brkić, M., Vičić, T., Seljan, S. (2009). Evaluation of the Statistical Machine Translation Service for Croatian-English. Proceedings of the 2nd International Conference The future of information sciences: Digital resources and knowledge sharing (INFuture 2009) (pp. 319-332).

Brown, P. F., Della Pietra, V. J., Della Pietra, S. A., Mercer, R. L. (1993). The mathematics of statistical machine translation: parameter estimation. Computational Linguistics - Special issue on using large corpora II, 19(2), pp. 263–311.

Buck, C., Heafield, K., van Ooyen, B. (2014). N-gram Counts and Language Models from the Common Crawl. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014) (pp. 3579-3584). Language Resources and Evaluation Conference.

Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y. (2014). On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. Proceedings of the Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8) (pp. 103-111).

Dunđer, I. (2015). Sustav za statističko strojno prevođenje i računalna adaptacija domene (Statistical Machine Translation System and Computational Domain Adaptation) / doctoral dissertation. University of Zagreb.

Dunđer, I. (2020). Machine Translation System for the Industry Domain and Croatian Language. Journal of Information and Organizational Science (JIOS), 44(1), 33–50.

Dunđer, I., Seljan, S., Pavlovski, M. (2020). Automatic Machine Translation of Poetry and a Low-Resource Language Pair. Proceedings of the 43rd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2020) (pp. 1034-1039).

Eisele, A., Christian, F., Uszkoreit, H., Saint-Amand, H., Kay, M., Jellinghaus, M., Hunsicker, S., Herrmann, T., Chen, Y. (2008). Hybrid Architectures for Multi-Engine Machine Translation. Translating and the Computer, 30, 12.

España-Bonet, C., Gonzàlez, M. (2014). Statistical Machine Translation and Automatic Evaluation / tutorial documentation. The 9th edition of the Language Resources and Evaluation Conference (LREC 2014) (p. 308). Language Resources and Evaluation Conference.

Gonzàlez, M. (2014). Automatic MT Evaluation / tutorial documentation. The 9th edition of the Language Resources and Evaluation Conference (LREC 2014) (p. 76). Language Resources and Evaluation Conference.

Jaworski, R., Seljan, S., Dunđer I. (2017). Towards educating and motivating the crowd – a crowdsourcing platform for harvesting the fruits of NLP students' labour. Proceedings of the 8th Language & Technology Conference – Human Language Technologies as a Challenge for Computer Science and Linguistics (pp. 332-336).

Jurafsky, D., Martin, J. (2013). Speech and Language Processing. Pearson New International Edition. Pearson Education Limited, 2nd edition, 2013.

Kamath, U., Liu, J., Whitaker, J. (2019). Deep Learning for NLP and Speech Recognition. Berlin: Springer, p. 621.

Klaper, D., Ebling, S., Volk, M. (2013). Building a German/Simple German Parallel Corpus for Automatic Text Simplification. Proceedings of the ACL 2013 Conference: The Second Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR 2013) (pp. 11-19). Association for Computational Linguistics.

Knight, K. A. (1999). Statistical MT Tutorial Workbook. JHU summer workshop (p. 36).

Koehn, P. (2004). Challenges in Statistical Machine Translation. Presentation at PARC, Google, ISI, MITRE, BBN, University of Montreal (p. 51).

Koehn, P. (2005). Europarl: A Parallel Corpus for Statistical Machine Translation. AAMT: The Tenth Machine Translation Summit (pp. 79-86).

Koehn, P. (2006). Statistical Machine Translation: the basic, the novel, and the speculative / tutorial documentation. EACL: 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006) (p. 81). European Chapter of the Association for Computational Linguistics.

Koehn, P. (2008). Introduction to Statistical Machine Translation / tutorial documentation. Chinese Workshop for Machine Translation (p. 214).

Koehn, P. (2010). Statistical Machine Translation. Cambridge University Press.

Koehn, P. (2015). Moses - Statistical Machine Translation System: User Manual and Code Guide. University of Edinburgh.

Koehn, P., Och, F. J., Marcu, D. (2003). Statistical Phrase-Based Translation. Proceedings of the 2003 Human Language technology Conference - North American Chapter of the Association for Computational Linguistics (HLT/NAACL 2003) (p. 7). North American Chapter of the Association for Computational Linguistics.

Madnani, N. (2010). Language Models / course material. INFM718G/CMSC838G course on Data-Intensive Information Processing Applications (Lin, J.; Madnani, N.), p. 63. University of Maryland.

Manning, C. D., Schütze, H. (1999). Foundations of Statistical Natural Language Processing. The MIT Press.

Manojlović, M., Dajak, L., Brkić Bakarić, M. (2017). Idioms in state-of-the-art Croatian-English and English-Croatian SMT systems. Proceedings of the 40th Jubilee International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2017) (pp. 1798-1802).

Mauser, A., Hasan, S., Ney, H. (2008). Automatic Evaluation Measures for Statistical Machine Translation System Optimization. Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08) (pp. 3089-3092). Language Resources and Evaluation Conference.

Reddy, M. V., Hanumanthappa, M. (2013). NLP challenges for machine translation from English to Indian languages. International Journal of Computer Science and Informatics, 3(1), p. 35.

Seljan, S., Dunđer, I. (2014). Combined Automatic Speech Recognition and Machine Translation in Business Correspondence Domain for English-Croatian. Proceedings of the International Conference on Embedded Systems and Intelligent Technology (ICESIT 2014) – International Journal of Computer, Information, Systems and Control Engineering, vol. 8 (pp. 1069-1075).

Seljan, S., Dunđer, I. (2015a). Automatic Quality Evaluation of Machine-Translated Output in Sociological-Philosophical-Spiritual Domain. Proceedings of the 10th Iberian Conference on Information Systems and Technologies (CISTI'2015), vol. 2 (pp. 128-131).

Seljan, S., Dunđer, I. (2015b). Machine Translation and Automatic Evaluation of English/Russian-Croatian. Proceedings of the International Conference “Corpus Linguistics – 2015” (CORPORA 2015) (pp. 72-79).

Seljan, S., Dunđer, I., Gašpar, A. (2013). From Digitisation Process to Terminological Digital Resources. Proceedings of the 36th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2013) (pp. 1329-1334).

Seljan, S., Dunđer, I., Pavlovski, M. (2020). Human Quality Evaluation of Machine-Translated Poetry. Proceedings of the 43rd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO 2020) (pp. 1040-1045).

Seljan, S., Škof Erdelja, N., Kučiš, V., Dunđer, I., Pejić Bach, M. (2020). Quality Assurance in ComputerAssisted Translation in Business Environments. Natural Language Processing for Global and Local Business, IGI Global, p. 22.

Seljan, S., Tucaković, M., Dunđer, I. (2015). Human Evaluation of Online Machine Translation Services for English/Russian-Croatian. Proceedings of the WorldCIST'15 – 3rd World Conference on Information Systems and Technologies (Advances in Intelligent Systems and Computing – New Contributions in Information Systems and Technologies) (pp. 1089-1098).

Sutskever, I., Vinyals, O., Le, Q. V. Sequence to Sequence Learning with Neural Networks. Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS'14), vol. 2, pp. (3104-3112).

Turchi, M., Goutte, C., Cristianini, N. (2012). Learning Machine Translation from In-domain and Out-of-domain Data. Proceedings of 16th Annual Conference of the European Association for Machine Translation (pp. 305-312). European Association for Machine Translation.

Ueffing, N., Haffari, G., Sarkar, A. (2007). Semi-supervised model adaptation for statistical machine translation. Machine Translation Journal, 21(2), 77–94.

Way, A., Hassan, H. (2009). Statistical Machine Translation: Trends & Challenges / tutorial documentation. Second International Conference on Arabic Language Resources & Tools (p. 174).

Wetzel, D., Bond, F. (2012). Enriching parallel corpora for statistical machine translation with semantic negation rephrasing. Proceedings of the ACL 2012 Conference: Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-6 '12) (pp. 20-29). Association for Computational Linguistics.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Copyright (c) 2021 Array