Contemporary methods for language technology research and development rely on the deployment of the appropriate resources. This paradigm spans almost all areas of language technology: from speech recognition and synthesis, to technologies for converting unstructured information (textual or multimedia) to structured by means of a range of information extraction technologies, and contemporary methods for machine translation technologies development.
In their turn, components and tools enabling the development of the above applications rely heavily on language resources, lexical resources and/or annotated or raw corpora, depending on the learning technique adopted. Modern syntactic and semantic parsers, named entity recognisers and other language processing tools depend heavily for their modelling on the appropriately selected and annotated language data. At the front of multilingualism and machine translation, the prevalence of statistical machine translation renders multilingual resources the absolute indispensable requirement.
However, language resources building is still a costly and time-consuming task. The unavailability of the appropriate resources, the absence of appropriate documentation of existing resources and their so far low degree of reusability are considered impeding factors to systems and applications development.
Accomplishments so far:
- ILSP has gained a central position in the language resources field at large, and has institutionalized its role as a central player in the e-infrastructure world. The notion of “language resource” has been recast extending to embrace tools in an integral manner as these are integrated in the resource production process today. ILSP’s accomplishments to date include:
- Resource production for training and evaluation of NLP tools and advanced HLT applications
- Repository technologies and policies for its data and tools
- Advanced research track in metadata design and specification towards better resource documentation and discovery
- Contribution to the standardization processes around ISO and ISO DCR, ISO TC37
- Set up of a number of language processing web services for internal use and experimentation as well as service offerings to remotely calling agents
- Design and implemention of interoperability layers for its tools and enhanced chaining
- Development of a UIMA-based consistent service-oriented platform leveraging open standards, exposing resources and lingware, and addressing the needs of a variety of NLP-based applications
- Participation in all relevant European networks and associations like Flarenet and ELRA
- Participation in the preparatory phase of the Clarin Research Infrastructure and prepared for acting as a Clarin Service centre
- Initiation of the Greek Clarin network
- Initiation and participation in META-SHARE, the largest Europe-wide, open and interoperable resource infrastructure for the human language technologies field
Current R&D focus:
- Establishment of the operation of the ILSP repository for Greek data and tools
- Aggregation and provision of metadata-based documentation for the whole range of ILSP data and tools, thus optimizing their reuse and repurposing
- Offer language processing services in the form of downloadable tools as well as in a language-as-service (web-service) fashion
- Design and implementation of syntactic and semantic interoperability layers for its language processing tools and their interconnection with external processing components
- Establishment and operation of the META-SHARE infrastructure of distributed networked repositories