The OROSSIMO project aimed at collecting electronic terminological resources (texts and terms), i.e. electronic scientific and technical texts and the terms contained therein, in order to be used in Natural Language Processing (NLP) applications.
The methodology followed for the collection of the terminological material is an adaptation of the state-of-the-art methodology which is used in the domain of Corpus Linguistics. The advantages of this approach consist in processing the terms in their actual context. Thus, the aim of OROSSIMO was not simply to collect special texts but also terms, which are linked to those texts.
The final deliverables of the project were:
- a collection of scientific and technical texts consisting of 2,500,000 words, and
- a collection of the terms contained in the texts, in both Greek and English (approximately 25,000 terms).
The terminological material collected concerns the following domains:
| Mass Media