Logo of Science Foundation Ireland  Logo of the Higher Education Authority, Ireland7 Capacities
Ireland's High-Performance Computing Centre | ICHEC
Home | News | Infrastructure | Outreach | Services | Research | Support | Education & Training | Consultancy | About Us | Login


Title:Parser-based retraining for domain adaptation of probabilistic generators
Authors:Hogan et al., 2008
Abstract: While the effect of domain variation on Penn-treebank- trained probabilistic parsers has been investigated in previous work, we study its effect on a Penn-Treebank-trained probabilistic generator. We show that applying the generator to data from the British National Corpus results in a performance drop (from a BLEU score of 0.66 on the standard WSJ test set to a BLEU score of 0.54 on our BNC test set). We develop a generator retraining method where the domain-specific training data is automatically produced using state-of-the-art parser output. The retraining method recovers a substantial portion of the performance drop, resulting in a generator which achieves a BLEU score of 0.61 on our BNC test data.
ICHEC Project:Parsing the British National Corpus (100M Words) with Automatically Acquired Deep probabilistic LFG Resources
URL: http://www.aclweb.org/anthology/W/W08/
Status: Published

return to publications list