Subproject 1 “Language Technologies for Content and Interaction Analysis” of Action “Computational Science and Technologies: Data, Content and Interaction” (Drasi “Athena” R.C.)

Start Date: 01/01/2017
End Date: 30/06/2021
Funding: Operational Programme «Competitiveness and Entrepreneurship» 2014-2020
Project Leader: Stainhaouer Gregory

The purpose of the project is to serve the Institute’s overall plan, both vertically with specialized research and development in individual areas, and horizontally with activities such as infrastructure, common platforms for organizing and distributing language resources and technologies, as well as a wide range of dissemination and exploitation activities that vary from informing the public and the international scientific community to the internal discovery of technologies with particular exploitation prospects including further support, shaping and hatching.

WP1 entitled “Natural Language Processing”, focuses on the study, analysis, processing and modeling of both written and spoken language. The aim of WP1 is to study and effectively utilize modern machine learning approaches for the processing of textual data. The emphasis will be on the area of unsupervised learning and the exploitation of large-scale data. The goal is to redesign and redefine the efficiency of the existing ILSP suite of Natural Language Processing tools, to develop new technologies and linguistic representation models, and to build integrated infrastructure systems for specific areas of interest. In Machine Translation, a number of techniques for improving the translation output will be studied, including the use of multilayered language models and iterative optimization processes based on computational intelligence techniques

WP2, entitled “Speech Technologies,” pertains to two key technologies: voice recognition and voice synthesis. The main focus in the area of voice recognition is on the development of an intelligent reading system based on automatic recognition of child speech, which can detect dyslexic problems in school age children, thus providing an early detection tool. In the field of speech synthesis, WP2 focuses on the audiovisual or multimodal speech synthesis.

The aim of WP3, “Physical Interaction and Embodied Communication”, is to touch upon two of the main fields of application of embodied communication. The first field has to do with the improvement of ILSP tools and infrastructure in sign language technologies. The second field of application concerns the development of a multimodal interactive system, and, more specifically, of a virtual assistant who will have the form of a digital character and will be able to communicate with the user via speech.

WP 4, “Learning Technologies”, concerns two fields of application of language technologies to education. The first has to do with the investigation of the mechanisms of comprehension of the written discourse. The aim is to model the reading behavior of students with various reading skills and to support reading comprehension through feedback and personalized support during the reading process. The second field of study will identify the design principles for the development of a mobile application in order to support real-time communication in Greek (by immigrants, tourists), with the ultimate goal of developing a relevant prototype.

WP5, “Language Resources Infrastructure”, incorporates actions related to the development of textual and lexical resources, as well as to the enrichment of existing ones, their interconnection and storage in the existing language resources infrastructure of the Institute. This concerns both reference corpora such as the Hellenic National Corpus, developed and maintained by ILSP, which will be enriched with new resources and will acquire a new, upgraded user interface, as well as a variety of computational lexica, whose content will be enriched and their interconnection will be investigated.

WP6 “Actions for dissemination, promotion and exploitation of Results” includes a set of planned actions to disseminate project results, publicity actions, as well as concrete actions towards direct support for their exploitation.