Multimodal Multilingual Resources in the Subtitling Process
|Authors:||Stelios Piperidis; Iason Demiros; Prokopis Prokopidis; P. Vanroose; A. Hoethker; W. Daelemans; E. Sklavounou; M. Konstantinou; Y. Karavidas|
|Book title:||Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004)|
|Organization:||European Language Resources Association|
In view of the expansion of digital television and the increasing demand to manipulate audiovisual content, tools producing subtitles in a multilingual setting become indispensable for the subtitling industry. Operating in this setting, the MUSA project aims at the development of a system which combines speech recognition, advanced text analysis, and machine translation to help generate multilingual subtitles; a system that converts audio streams into text transcriptions, condenses the content to meet the spatio-temporal constraints of the subtitling process and produces draft translations in two language pairs. Three European languages are supported: English as source and target as far as subtitling generation is concerned, French and Greek as subtitle translation target languages. In order to train and evaluate system components, an array of application specific resources are necessary. Primary audiovisual data consist in BBC TV documentaries and ``newsy'' current affairs programmes. For each programme, the following data are captured: the actual video, its transcript or script, English, Greek and French subtitles, and topically relevant newspaper or web-sourced extracts.