Logo of Science Foundation Ireland  Logo of the Higher Education Authority, Ireland7 Capacities
Ireland's High-Performance Computing Centre | ICHEC
Home | News | Infrastructure | Outreach | Services | Research | Support | Education & Training | Consultancy | About Us | Login


Title:Feature decay algorithms for fast deployment of accurate statistical machine translation systems
Authors:Ergun Bicici, 2013
Abstract: We use feature decay algorithms (FDA) for fast deployment of accurate statistical machine translation systems taking only about half a day for each translation direction. We develop parallel FDA for solving computational scalability problems caused by the abundance of training data for SMT models and LM models and still achieve SMT performance that is on par with using all of the training data or better. Parallel FDA runs separate FDA models on randomized subsets of the training data and combines the instance selections later. Parallel FDA can also be used for selecting the LM corpus based on the training set selected by parallel FDA. The high quality of the selected training data allows us to obtain very accurate translation outputs close to the top performing SMT systems. The relevancy of the selected LM corpus can reach up to 86% reduction in the number of OOV tokens and up to 74% reduction in the perplexity. We perform SMT experiments in all language pairs in the WMT13 translation task and obtain SMT performance close to the top systems using significantly less resources for training and development.
ICHEC Project:Large Scale Experiments on the Prediction of Machine Translation Performance
Publication:In Proceedings of the Eighth Workshop on Statistical Machine Translation, pages 76–82, Sofia, Bulgaria. Association for Computational Linguistics.
URL: http://www.aclweb.org/anthology-new/W/W13/W13-2206.pdf
Status: Published

return to publications list