Integrating Language Technology in a web-enabled Cultural Heritage system
|Authors:||Penny Labropoulou; Harris Papageorgiou; Byron Georgantopoulos; D. Tsagogeorga; Iason Demiros; Vassilis Antonopoulos|
|Book title:||Language Resources and Evaluation Conference (LREC-2008)|
This paper describes a web-enabled sophisticated Cultural Heritage (CH) system giving access to digital resources of various media, which exploits Language Technologies (LT) in order to enhance the performance of the search and retrieval mechanisms. More specifically, the paper presents the system requirements and architecture, drawing aspects from: (a) the cultural data repository and its particularities; (b) the unified metadata scheme that has been devised, integrating elements from various metadata standards, providing thus a rich description of the resources; (c) the thesauri (one of the major pillars of the system) that provide uniform access to the resources. The LT that form part of the system construction and use are presented in detail, focusing on the Term Extraction and Named Entity Recognition tools used in the construction of the thesauri and the metadata annotation process, and the Term Matching module exploited in the mining process for the identification of query terms which appear in a morphosyntactically similar form in the thesauri.