Scientific Associate


Roussis Dimitris

Dimitris Roussis has been collaborating with ILSP / “Athena” R. C. as a data scientist since 2020 and has participated in the “ELG: European Language Grid”, “ELRC3: Action on CEF Automated Translation Core Service Platform”, “COVID-19 MLIA”, and “SciLake” projects. He holds a MEng in Civil Engineering (National Technical University of Athens, 2019) and a MSc in “Data Science and Information Technologies” (National and Kapodistrian University of Athens, 2022; excellence scholarship due to highest average grade in class). As member of the ILSP teams, he has participated in the WMT22 General Machine Translation shared task for English-Ukrainian (EN→UK & UK→EN) MT systems. Key areas of expertise and research interests include:

  • Language Resources Creation: Acquisition, processing, labelling, and filtering of monolingual and multilingual documents/websites for the creation of large textual corpora.
  • Neural Machine Translation (NMT) Systems Development: End-to-end training, domain adaptation, and rigorous evaluation of NMT systems (incl. creation of new evaluation sets).
  • Data-centric LLM (Large Language Model) Development: Constructing datasets and preprocessing data mixes for continual pretraining and instruction-tuning of LLMs.
  • Prompt Engineering for GenAI (Generative Artificial Intelligence): Prompting techniques for model distillation, data augmentation, and various complex NLP (Natural Language Processing) tasks.
  • Philosophy of language & AI: Centered around the intersection of data safety, alignment with human preferences, and potential vulnerabilities/risks of AI models.