Mediaforall

Evaluating the intelligibility and acceptability of automatically generated subtitles from the end-users’ perspective

Recent research has proposed that language technologies be incorporated into the subtitling process (Gambier 2008, MUSA, eTITLE, MovRat). Within this line of research, automatically generated subtitles have been evaluated using automatic metrics, a common practice in the MT research community (Volk & Harder 2007, Volk 2008). However, to date, there have been no comprehensive human end-user evaluations of automatically generated subtitles. This paper presents the findings of individual evaluation sessions conducted with 44 subjects, all German native speakers, who viewed six clips taken from a Harry Potter movie: three of the clips had an English language soundtrack; three had a Dutch language soundtrack; and all of the clips contained German subtitles generated by an Example-based Machine Translation (EBMT) system. Unknown to the subjects, they were split into three groups: 15 subjects viewed clips with subtitles generated by Corpus A; 15 subjects viewed clips with subtitles generated by Corpus B; and 14 subjects viewed clips with subtitles generated by Corpus C.
The three corpora differed in size, homogeneity, and the number of source language repetitions in the corpus. The research examined whether any of these three variables had an influence on the intelligibility (measured through the comprehensibility and the readability of the subtitles) and acceptability (measured through the style and the well-formedness of the subtitles) of automatically generated subtitles from the end-users’ perspective. The methodology used was a questionnaire interview, whereby the subject viewed a movie clip and then the researcher asked a combination of open and closed questions relating to the four quality characteristics underlined above. This process continued until all six clips had been viewed. Subjects were also asked some more general questions in relation to educational background, AVT preferences, subtitling and Harry Potter. The results were analysed quantitatively and qualitatively, and translation quality scores were also calculated using automatic metrics.
The quantitative findings of the study did not reveal any statistical significant differences that would allow us to rank the corpora in terms of intelligibility and acceptability. However, they did highlight possible influences of subjects’ linguistic background and prior knowledge on their evaluation of subtitles. The qualitative results suggested that Corpus C generated more readable subtitles and the highest levels of overall satisfaction. More importantly, the qualitative data highlighted that the end-users are accepting of MT-generated subtitles in certain contexts, even when the quality of the subtitles is ranked as low (both by the end-users and the automatic metric scores).

References
Gambier, Yves. (2008). Recent developments and challenges in audiovisual translation research IN: D. Chiaro, C. Heiss and C. Bucaria (eds.) Between Text and Image. Amsterdam: John Benjamins, pp. 11-33.
eTITLE < Melero, Maite, Antoni Oliver and Toni Badia. (2006). Automatic multilingual subtitling in the eTITLE project. IN: Proceedings of Translating and the Computer 28, London, England: ASLIB. no pagination>.
MovRat <Armstrong, Stephen, Colm Caffrey, Marian Flanagan, Dorothy Kenny, Minako O’Hagan and Andy Way. (2006). Leading by Example: Automatic Translation of Subtitles via EBMT? Perspectives, 14 (3), pp.163- 184>.
MUSA < http://sifnos.ilsp.gr/musa>.
Volk, M. (2008). The Automatic Translation of Film Subtitles. A Machine Translation Success Story? IN: J. Nivre, M. Dahllöf, and B. Megyesi (eds.) Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein. Sweden: Uppsala University. pp. 202-214.
Volk, M. and Søren Harder. (2007). Evaluating MT with Translations or Translators. What is the Difference? IN: Proceedings of MT Summit XI, Copenhagen, Denmark. pp. 499-506.

Marian FLANAGAN
Dublin City University, Ireland
marian.flanagan@dcu.ie

Marian FLANAGAN is in the final stages of her PhD focusing primarily on evaluation methods for automated subtitles in the domain of screen translation. The study combines corpus-analysis techniques with an end-user evaluation phase of EBMT subtitles. She is a Government of Ireland Scholar (IRCHSS) 2008/09. She graduated from Dublin City University in 2003 with an MA in Translation Studies and graduated from Trinity College Dublin in 2001 with a BA (Mod) in Computer Science, Linguistics and German.