Logo of Science Foundation Ireland  Logo of the Higher Education Authority, Ireland7 CapacitiesGPGPU Research Projects
Ireland's High-Performance Computing Centre | ICHEC
Home | News | Infrastructure | Outreach | Services | Research | Support | Education & Training | Consultancy | About Us | Login

Publication

Title:C-Structures and F-Structures for the British National Corpus (In Proceedings of the Twelfth International Lexical Functional Grammar Conference LFG07)
Authors:Joachim Wagner, Djamé Seddah, Jennifer Foster and Josef van Genabith, 2007
Abstract: We describe how the British National Corpus (BNC), a one hundred million word balanced corpus of British English, was parsed into Lexical Functional Grammar (LFG) c-structures and f-structures, using a treebank-based parsing architecture. The parsing architecture uses a state-of-the-art statistical parser and reranker trained on the Penn Treebank to produce context-free phrase structure trees, and an annotation algorithm to automatically annotate these trees into LFG f-structures. We describe the pre-processing steps which were taken to accommodate the differences between the Penn Treebank and the BNC. Some of the issues encountered in applying the parsing architecture on such a large scale are discussed. The process of annotating a gold standard set of 1,000 parse trees is described. We present evaluation results obtained by evaluating the c-structures produced by the statistical parser against the c-structure gold standard. We also present the results obtained by evaluating the f-structures produced by the annotation algorithm against an automatically constructed f-structure gold standard. The c-structures achieve an f-score of 83.7% and the f-structures an f-score of 91.2%.
ICHEC Project:Parsing the British National Corpus (100M Words) with Automatically Acquired Deep probabilistic LFG Resources
Publication:CSLI Publications, Stanford University, 28-30, pages 418-438
URL: http://rian.ie/en/item/view/30472.html
Keywords: Machine translating; lexical functional grammar
Status: Published

return to publications list