PRESEMT: Pattern REcognition-based Statistically Enhanced Machine Translation

Start Date: 01/01/2010
End Date: 31/12/2012
Funding: ICT (FP7) - Research Infrastructures
Project Leader: Tambouratzis George
Website: http://www.presemt.eu

The PRESEMT (Pattern REcognition-based Statistically Enhanced MT) project has been funded under “ICT-2009.2.2: Language-based Interaction”. It is intended to lead to a flexible and adaptable MT system, based on a language-independent method, whose principles ensure easy portability to new language pairs. This method attempts to overcome well-known problems of other MT approaches, e.g. compilation of extensive bilingual corpora or creation of new rules per language pair. PRESEMT will address the issue of effectively managing multilingual content and is expected to suggest a language-independent machine-learning-based methodology.

Key innovation

The PRESEMT project proposes a novel approach to the problem of Machine Translation by introducing cross-disciplinary techniques, mainly borrowed from the machine learning and computational intelligence domains, in the MT paradigm.

To this end, a flexible MT system will be developed, which will be enhanced with (a) pattern recognition approaches (such as extended clustering or neural networks) towards the development of a language-independent analysis and (b) evolutionary computation (such as Genetic Algorithms or Swarm Intelligence) for system optimisation.

Features

Development of a novel method based on generalised clustering techniques, for creating a language-independent phrase aligner also adaptable to phrasing principles designated by the end users
Use of pattern recognition approaches for defining syntactic structure
Employment of techniques inspired by functional biological systems for disambiguating translations
Extensive use of automated optimisation techniques to define a process for methodically optimising system parameters
Application of machine learning methods for allowing system adaptation
Use of parallel computing architectures as well as mainstream multi-core architectures for PCs for substantial advances in translation speed

Project presentation

Annual public report

Contributors

Departments

Speech, Media and Content Technologies