Theoretical and Practical Issues in the Construction of a Greek Dependency Treebank
|Authors:||Prokopis Prokopidis; Elina Desipri; Maria Koutsombogera; Harris Papageorgiou; Stelios Piperidis|
|Editor:||Montserrat Civit and Sandra K?bler and Ma. Ant?nia Mart?|
|Book title:||Proceedings of The Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005)|
|Organization:||Universitat de Barcelona|
In this paper, we present work in progress for the construction of the Greek Dependency Treebank. GDT currently encompasses annotation at the level of syntax and semantics. The initial GDT dataset comprises 70KW of Greek texts, pertaining mainly to EU politics, with smaller segments from the travel and health domains. The data were extracted from collections compiled to meet the needs of funded research projects focusing on multilingual, multimedia information extraction. Thus, annotation efforts aim at the creation of training and testing material that will aid the development of processing tools in specific application domains. On the other hand, we are trying to build the basis for a reference corpus of Greek that can prove useful in contexts different from the particular application domains as a resource for investigation of linguistic structures in real-life texts, and as training material for developing machine-learning approaches to syntactic parsing and semantic role labeling of unrestricted Greek text.