Align-it: Texts alignment
Two languages can be more informative than one. Clearly, a corpus of properly aligned text constitutes an extremely valuable source of information, not only for researchers in lexicography and terminology, but also for a wide range of applications, including Translation Memories, bilingual dictionaries and lexicons, etc.
Alignment is the process of establishing correspondences between text units and producing corpora of parallel texts. The alignment resolution level depends on the nature of its specific application; for example, a "gross" alignment would be the establishment of correspondences between text paragraphs, while a very refined level of alignment could show correspondences at a word level. The most usual and useful level is sentence alignment.
Sentence aligning by hand is not only time and cost consuming, but also presupposes accurate skills of individuals with a good knowledge of all language pairs that are to be aligned. For that reason, there exist software applications that produce reliable alignments at a minimal cost.
Align-it is capable of producing high quality sentence alignments for an entire bilingual corpus. This is a very useful output, both at a text pre-processing level, for the implementation of other specialized tools (for example, translation memories) and as a translation revision and evaluation tool.
Furthermore, Align-it provides the user with the opportunity to see a text and its translation side-by-side, with explicit connections between text elements. It can also facilitate deeper automatic analysis of translations; for example, it can be used to mark possible omissions in a translation, or to signal common translation mistakes, such as terminological inconsistencies.
Align-it treats six aligning cases:
For the languages belonging to the occidental family, most sentences in one language match exactly one sentence in the other language; however, it is possible for a sentence to match two or more (or even no) sentences in the other language.
The sentence alignment task is to identify correspondences between sentences in one language and sentences in the other language. This is the first step toward the more ambitious task of finding correspondences between words.
The input of Align-it is a pair of texts such as in the following figure (the extract is from Genesis in the Bible corpus):
The output identifies the alignment between sentences. The first two alignments in the next figure illustrate the typical case where one English sentence aligns with exactly one French sentence, while the third one illustrates a one-to-two alignment.
Align-it further offers:
Align-it operates in a Windows environment through a Graphical User Interface with the use of a mouse/keyboard. The program provides the user with an editor that supports basic editing facilities, such as creating a new text or opening an existing one, cutting and pasting, saving and printing. The editor also supports opening a corpus file that includes a list of parallel texts, selecting a pair from the list and aligning the respective texts. The corpus file also contains the delimiters used to identify the paragraph and the sentence boundaries within the corpus. In the case of the alignment of two separated files, these delimiters can be input by the user in a specific delimiter dialog box.
The alignment output can be presented in several ways. The user can interfere in the case that a misalignment occurs and manually correct the error. The final alignment can be stored to an output file containing either the aligned chunks of text or the aligned offsets of text.
In the environment of Align-it, a handler is also integrated as a pre-alignment tool that is able to recognize automatically the sentence boundaries and mark them appropriately.
Contact person: Stelios Piperidis