The project is expected to produce the following deliverables:
|Resource evaluation and system requirements||R||RE|
|Linguistic resources: methodology of adaptation/development, description of resources||R||RE|
|Prototype of the algorithms||PR||RE|
|Overall presentation and evaluation of the system||R||RE|
|Technology implementation plan||R||RE|
RESULTS TlLL NOW
The work done up till now in the framework of METIS is described below:
- Collection and setting up of linguistic resources to be used for the purposes of the project METIS, i.e :
- corpora, BNC is used as the target language corpus and ILSP corpus as the source language corpus.
- taggers, for the target corpus (the BNC) we are using the Memory-Based Tagger generator (MBT) trained on a part of the BNC corpus, and producing as output the CLAWS6 tagset.
- lemmatizers, MBLEM, the Memory-Based LEMmatizer, an adapted version of a memory-based learning approach to morphological analysis (Van den Bosch and Daelemans, 1999) is used for Dutch, while ILSP lemmatizer is used for Greek
- bilingual lexica, the Greek bilingual lexicon consists of about 10,000 entries, while the Dutch one about 110,000 translations. Both lexica contain POS and TAG information in both source and target language.
- tagsets, both the CLAWS6 tagset and the ILSP-Greek tagset are EAGLES/TEI compatible and encode morphological information
- An evaluation of the resources in line with the requirements of the project, i.e a comparison between the two tagsets and a basic comparison between Greek and English language in terms of structure
- An outline of the evaluation strategy of the project was designed in order to determine our policy as regards both the statistical part of the enterprise as well as the overall design of the system.
- The two first versions of the software are ready and functional. The software runs on a MATLAB environment.
- Experiments begun producing the first results. The experiments test the systemís performance with regard to the main structural characteristics of the source languages.
- The Greek bilingual lexicon is being adapted to the project requirements.