Automatic assistant for unattended production of subtitles
Our proposal describes, from a technical point of view, the design and architecture of a prototype system that assists in the production of subtitles, both live subtitling and closed captions. The system receives the source text (the original script of the film or the piece of news) and performs the necessary steps for carrying out the parsing, part-of-speech tagging, syntactical analysis and rule-based algorithmics to generate the output with the most adequate line-separated subtitles. In addition, the system filters possible annotations useless for subtitles and also performs a spell, grammar and style check on the original text to correct possible mistakes or errors.
In our presentation, we will place special emphasis on the possibilities of these technologies and point out their current limitations. Although the system has been implemented for the Spanish language only, a similar logical workflow is also valid for other Western languages such as English, French or German.
The main purpose of the system is to simplify the process of subtitle creation, increase productivity and to ensure compliance with the standards and recommendations for best practices, specifically the Spanish National Standard UNE 153010 (“Subtitling for deaf and hard-of-hearing people; Subtitling through teletext”), which was published by the Spanish Association for Standardisation and Certification (AENOR) in September 2003. Included rules cover the control over the recommended length of the lines, division of phrases according to symbols such as commas or full stops and stop words such as prepositions or conjunctions, substitution of special symbols such as ‘euro’ or ‘dollar’ and frequent abbreviations or acronyms (EU, NATO, etc.). This prototype has been beta-tested and validated by the Spanish Centre for Subtitling and Audio Description (CESyA), which is the public institution dedicated to promoting audio description for disabled people and to encouraging accessibility in the Spanish audiovisual arena. In addition, it has been integrated into a well-known application for the preparation of subtitle files and installed for user testing and validation at the teletext unit of RTVE (the state-owned Spanish public radio and television service).
|
José Carlos GONZÁLEZ-CRISTÓBAL
Daedalus S.A., Spain
jgonzalez@daedalus.es
Julio VILLENA-ROMÁN
Daedalus S.A., Spain
jvillena@daedalus.es
José Carlos GONZÁLEZ-CRISTÓBAL has been the managing director of DAEDALUS, S.A. since its foundation in 1998. He holds a PhD in Telecommunication Engineering from the Technical University of Madrid (Universidad Politécnica de Madrid, UPM) (Spain). He has been assistant professor at the High Technical School of Telecommunication Engineering (Escuela Técnica Superior de Ingenieros de Telecomunicación) of the UPM since 1985. From 1996 to 1998, he held the post of technical director of CITAM AIE, a company founded in 1993 by UPM (along with the companies TELEFÓNICA, ALCATEL, PRISA and INDRA). Since 2003, he has also been visiting professor at the Department of Computer Science, Royal Holloway, University of London (United Kingdom). Since 2002, he has been the president of the Spanish Chapter of the Computer Society, IEEE.
Julio VILLENA-ROMÁN is a telecommunications engineer from the Technical University of Madrid (Universidad Politécnica de Madrid, UPM) (Spain), where he obtained his degree in 1997. He is a founding partner and the director of technology at DAEDALUS. He has been associate professor at the Department of Telematics Engineering, Carlos III University, Madrid (Spain) since 2002. He has led research projects in the field of Intelligent Systems, having presented over 50 scientific papers in national and international conferences.
|
|