PANACEA: Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies

Start Date: 01/01/2010
End Date: 31/12/1201
Funding: ICT (FP7)
Project Leader: Prokopidis Prokopis

The objective of PANACEA is to develop an infrastructure for combining language technologies (LRs) that will focus in the automatic production of the huge Language Resources needed by modern Machine Translation and Natural Language Processing applications. To this end, one of the project’s outcomes will be a factory that will automate all stages involved in the acquisition, production, updating and maintenance of LRs, so that these resources can be effectively used in different language pairs, genres and domains. The LRs to be produced for the evaluation of the PANACEA factory will include monolingual and bilingual corpora and dictionaries. Reductions in cost, time and human effort are expected to contribute significantly in overcoming the language barriers Europe has to deal with today.

Main objectives:

  • Creation of an open web service-based platform for easy designing of workflows focusing on building LRs automatically
  • Development of techniques for monolingual and parallel corpora acquisition and processing
  • Use of sentential and sub-sentential aligned data for deriving bilingual dictionaries and extracting transfer grammars
  • Development of techniques for automatic acquisition of subcategorization frames, selectional preferences, multiword expressions and lexical-semantic classes