Speech-to-Text Technology as a Documentation Tool for Interpreters: a new approach to compiling an ad hoc corpus and extracting terminology from video-recorded speeches

Autores/as

  • Mahmoud Gaber Universidad de Málaga España
  • Gloria Corpas Pastor Universidad de Málaga
  • Ahmed Omer Universidad de Wolverhampton Reino Unido

DOI:

https://doi.org/10.24310/TRANS.2020.v0i24.7876

Palabras clave:

Transcripción automática, herramientas de interpretación asistida por ordenador, extracción de terminología, corpus ad hoc, tecnologías de la interpretación

Resumen

Aunque el ámbito de la interpretación no se ha beneficiado de los desarrollos tecnológicos en la misma medida que en traducción, actualmente asistimos al surgimiento de gran interés por desarrollar soluciones adaptadas a las necesidades de los intérpretes. En concreto, el Reconocimiento Automático de Habla (RAH) comienza a ser utilizado como parte de las herramientas de interpretación asistida, bien como componente de tales sistemas o como aplicación autónoma. El presente estudio persigue tres objetivos principales: i) determinar la herramienta de transcripción automática más apropiada para la compilación de corpus ad hoc, comparando diversos sistemas de transcripción automática y evaluando su rendimiento; ii) utilizar RAH para extraer terminología a partir de las transcripciones de discursos orales en vídeo; y iii) promover el uso de RAH como nueva herramienta documental en interpretación. Se trata de uno de los primeros estudios en los que se abordan las posibilidades que ofrece la tecnología habla-texto para cubrir las necesidades terminológicas y documentales de los intérpretes en la fase de preparación de un encargo dado.

Descargas

Los datos de descargas todavía no están disponibles.

Métricas

Cargando métricas ...

Biografía del autor/a

Mahmoud Gaber, Universidad de Málaga

Licenciado en Filología Hispánica por la Universidad de Al-Azhar, Egipto. Máster en Comunicación Intercultural, Interpretación y Traducción en los Servicios Públicos por la Universidad de Alcalá de Henares, España. Traductor jurado, acreditado por el Ministerio de Justicia, Emiratos Árabes Unidos. Ha realizado una estancia de investigación en el Research Group in Computational Linguistics, University of Wolverhampton, Reino Unido. Actualmente becario predoctoral en el marco del Programa Estatal de Promoción del Talento y su Empleabilidad del Ministerio de Economía, Industria y Competitividad para la formación de doctores.

Gloria Corpas Pastor, Universidad de Málaga

PhD in English Philology from the Universidad Complutense de Madrid (1994). Visiting Professor in Translation Technology at the Research Institute in Information and Language Processing (RIILP) of the University of Wolverhampton, UK (since 2007), and Professor in Translation and Interpreting (2008). Spanish delegate for AEN/CTN 174 and CEN/BTTF 138, actively involved in the development of the UNE-EN 15038:2006 and currently involved in the future ISO Standard (ISO TC37/SC2-WG6 "Translation and Interpreting"), where she is in charge of ISO/AWI 20539 (Translation, interpreting and related technology – Vocabulary) and ISO/DIS 18841 (Interpreting services – General requirements and recommendations). President of the Evaluation and Verification Commission of the Arts and Humanities field at Fundación para el Conocimiento Madri+d and member of the specific committee for doctorate programmes at AQU. She is currently Director of the Department of Translation and Interpreting of the University of Malaga.

Ahmed Omer, Universidad de Wolverhampton

Ph.D Researcher in Computational Linguistics, Research Institute in Information and Language Processing, University of Wolverhampton, UK. M.Sc in Information Technology, Herriot Watt University, Edinburgh, UK. B.Sc in Computer Science, Herriot Watt University, Edinburgh, UK. Corpus Workbench Programmer, Research Institute in Information and Language Processing, University of Wolverhampton, UK.

Citas

Ali, Ahmed, and Steve Renals (2018): “Word Error Rate Estimation for Speech Recognition: e-WER”, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers), Melbourne, Australia, 20–24.

Arce Romeral, Lorena and Míriam Seghiri (2018): “Booth-friendly term extraction methodology based on parallel corpora for training medical interpreters”, Current Trends in Translation Teaching and Learning E, 5, 1-46.

Bale, Richard (2013): “Undergraduate Consecutive Interpreting and Lexical Knowledge”, The Interpreter and Translator Trainer, 7/1, 27-50.

Blanchard, Nathaniel, Michael Brady, Andrew M. Olney, Marci Glaus, Xiaovi Sun, Martin Nystrand, Borhan Samei, Sean Kelly, and Sidney D’Mello (2015): “A study of automatic speech recognition in noisy classroom environments for automated dialogue analysis”, International Conference on Artificial Intelligence in Education, Springer, Cham, 23–33.

Brownlee, Johnson (2017): “A Gentle Introduction to Calculating the BLEU Score for Text in Python”, Machine Learning Mastery, 20th Nov <https://machinelearningmastery.com/> [Accessed: 04-VI-2020].

Cheung, Andrew KF, and Li Tianyun (2018): “Automatic speech recognition in simultaneous interpreting: A new approach to computer-aided interpreting”, Proceedings of Ewha Research Institute for Translation Studies International Conference, At Ewha Womans University.

Condon, Sherri L., Jon Phillips, Christy Doran, John S. Aberdeen, Dan Parvaz, Beatrice T. Oshika, Gregory A. Standers, and Craig Schlenoff (2008): “Applying Automated Metrics to Speech Translation Dialogs”, Proceedings of LREC-May 2008.

Corpas Pastor, Gloria and Isabel Durán-Muñoz (eds.) (2017): Trends in E-tools and Resources for Translators and Interpreters, Leiden, Holland: Brill | Rodopi. https://doi.org/10.1163/9789004351790.

Corpas Pastor, Gloria (2018): “Tools for Interpreters: The Challenges that Lie Ahead”, Current Trends in Translation Teaching and Learning E, 5, 157-182.

Costa, Hernani, Gloria Corpas Pastor, and Isabel Durán Muñoz (2014): “Technology-assisted Interpreting”, MultiLingual, 143/25, 27-32.

Daille, Béatrice (2012): “Building bilingual terminologies from comparable corpora: The TTC TermSuite”, Proceeding of 5th Workshop on Building and Using Comparable Corpora at LREC 2012.

Deng, Li, and Douglas O’Shaughnessy (2003): Speech Processing: A Dynamic and Optimization-Oriented Approach, New York: Marcel Dekker Inc.

Deriu, Jan, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, and Mark Cieliebak (2019): “Survey on evaluation methods for dialogue systems”, arXiv preprint arXiv:1905.04071, 1, 1-62 <https://arxiv.org/abs/1905.04071> [Accessed: 04-VI-2020].

Desmet, Bart, Mieke Vandierendonck, and Bart Defrancq (2018): “Simultaneous interpretation of numbers and the impact of technological support”, in Claudio Fantinuoli (ed.), Interpreting and technology, Berlin: Language Science Press, 13–27, doi:10.5281/zenodo.1493291.

Drouin, Patrick (2003): “Term extraction using nontechnical corpora as a point of leverage”, Terminology, 9/1, 99-115.

Durán Muñoz, Isabel, Gloria Corpas Pastor, Le An Ha, and Ruslan Mitkov (2015): “Introducing ProTermino: A New Tool Aimed at Translators and Terminologists”, Traducimos desde el Sur, Actas del VI Congreso Internacional de la Asociación Ibérica de Estudios de Traducción e Interpretación. Las Palmas de Gran Canaria, 23-25 de enero de 2013, Las Palmas de Gran Canaria: Universidad de Las Palmas de Gran Canaria, Servicio de Publicaciones y Difusión Científica, 623-638.

Fantinuoli, Claudio (2006): “Specialized Corpora from the Web for Simultaneous Interpreters”, in Marco Baroni and Silvia Bernardini (eds.), Wacky! Working papers on the Web as Corpus, Bologna: GEDIT, 173-190.

Fantinuoli, Claudio (2016): “InterpretBank: Redefining computer-assisted interpreting tools”, Proceedings of the 38th Conference Translating and the Computer, 42–52.

Fantinuoli, Claudio (2017a): “Computer-assisted preparation in conference interpreting”, Translation & Interpreting 9/2, 24-37.

Fantinuoli, Claudio (2017b): “Speech recognition in the interpreter workstation”, Proceedings of the Translating and the Computer 39 Conference, London: Editions Tradulex, 367–377.

Fantinuoli, Claudio and Bianca Prandi (2018): “Teaching information and communication technologies: a proposal for the interpreting classroom”, Transkom 11/2, 162-182.

Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima (2000): “Automatic recognition of multi-word terms”, International Journal of Digital Libraries, 3/2, 117-132.

Gallego Hernández, Daniel and Miguel Tolosa (2012): “Terminología bilingüe y documentación ad hoc para intérpretes de conferencias. Una aproximación metodológica basada en corpus”, Estudios de Traducción, 2, 33-46.

Gile, Daniel (1990): “Scientific research vs. personal theories in the investigation of interpretation”, in Laura Gran and Christopher Taylor (eds.), Aspects of applied and experimental research on conference interpretation, Udine: Campanotto Editore, 28-41.

González, María, Julián Moreno, José Luis Martínez, and Paloma Martínez (2011): “An illustrated methodology for evaluating ASR systems”, International Workshop on Adaptive Multimedia Retrieval, Berlin: Springer, 33-42.

He, Xiaodong, Li Deng, and Alex Acero (2011): “Why word error rate is not a good metric for speech recognizer training for the speech translation task?”, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 5632-5635.

Kalina, Sylvia (2000): “Interpreting competences as a basis and a goal for teaching”, The Interpreters’ Newsletter, 10, 3–32.

Khandelwal, Renu (2020): “Blue- Bilingual Evaluation Understudy: A step by setp approach to understanding BLEU, a metric to understand the effectiveness of Machine Translation (MT)”, Towards Data Science, 25th Jan <https://towardsdatascience.com/bleu-bilingual-evaluation-understudy-2b4eab9bcfd1> [Accessed: 04-VI-2020].

Kolár, Jáchym and Lori Lamel (2012): “Development and Evaluation of Automatic Punctuation for French and English Speech-to-Text”, Thirteenth Annual Conference of the International Speech Communication Association.

Lam, Adriane R., Jennifer E. Bauer, Susanna Fraass, Sarah Sheffield, Maggie R. Limbeck, Rose M. Borden, Megan E. Thompson-Munson (2019): “Time Scavengers: An Educational Website to Communicate Climate Change and Evolutionary Theory to the Public through Blogs, Web Pages, and Social Media Platforms”, Journal of STEM Outreach, 7/2, 1-8.

Oliver, Antoni, and Mercè Vázquez (2007): “A Free Terminology Extraction Suite”, Proceeding of Translating and the Computer 29th Conference, London.

Oliver, Antoni, Mercè Vázquez, and Joaquim Moré (2007): “Linguoc LexTerm: una herramienta de extracción automática de terminología gratuita”, Translation Journal, 4/11.

Peitz, Stephan, Markus Freitag, Arne Mauser, and Hermann Ney (2011): “Modeling Punctuation Prediction as Machine Translation”, International Workshop on Spoken Language Translation (IWSLT), 238-245.

Papineni, Kishore, Salim Roukos, Todd Ward, and WeiJing Zhu (2002): “BLEU: A Method for Automatic Evaluation of Machine Translation”, Proceedings of the 40th annual meeting on association for computational linguistics, Association for Computational Linguistics, 311-318.

Pérez-Pérez, Pablo (2018): “The Use of a Corpus Management Tool for the Preparation of Interpreting Assignments: A Case Study”, The International Journal for Translation and Interpreting Research, 10/1 137-151.

Pöchhacker, Franz (2016): Introducing Interpreting Studies, Abingdon: Routledge, 2nd edition.

Quintas, Laura Cacheiro (2017): “Towards a Hybrid Intralinguistic Subtitling Tool: Miro Translate”, in J. Esteves-Ferreira, J. Macan, R. Mitkov and O.-M. Stefanov (eds.), Proceedings of the 39th Conference Translating and the Computer. ASLING, London, UK, November 16-17, 2017, Geneva: Tradulex, 01-06.

Sánchez Ramos, María del Mar (2017): “Interpretación sanitaria y herramientas informáticas de traducción: los sistemas de gestión de corpus”, Panace@, 18/46, 133-141.

Sandrelli, Annalisa, and Jesus De Manuel Jerez (2007): “The Impact of Information and Communication Technology on Interpreter Training: State-of-the-Art and Future Prospects”, The Interpreter and Translator Trainer 1/2, 269-303, doi:10.1080/1750399X.2007.10798761.

Setton, Robin (1999): Simultaneous interpretation: A cognitive-pragmatic analysis, (vol. 28), Amsterdam: John Benjamins Publishing.

Stinson, Micheal S., Sandy Eisenberg, Christy Horn, Judy Larson, Harry Levitt, and Ross Stuckless (1999): “Real-time speech-to-text services”, in Ross Stuckless (ed.), Reports of the National Task Force on Quality Services in Postsecondary Education of Deaf and Hard of Hearing Students. Rochester, NY: Northeast Technical Assistance Center, Rochester Institute of Technology.

Vivaldi, Jorge (2001): Extracción de candidatos a término mediante la combinación de estrategias heterogéneas, Ph.D dissertation, Barcelona: Universidad Politécnica de Catalunya.

Vogler, Nikolai, Craig Stewart, and Graham Neubig (2019): “Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation”, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, 109–118, doi: 10.18653/v1/N19-1010.

Weiss, Ron J., Jan Chorowski, Navdeep Jaitly, Yonghui Wu, and Zhifeng Chen (2017): “Sequence-to-Sequence Models Can Directly Translate Foreign Speech”, Proceeding of Interspeech, doi: 10.21437/Interspeech, 2017-503.

Xu, Ran (2015): Terminology preparation for simultaneous interpreters, Ph.D dissertation, Leeds: University of Leeds.

Xu, Ran (2018): “Corpus-based terminological preparation for simultaneous interpreting”, Interpreting, 20/1, 33–62.

Yu, Dong, and Li Deng. (2015): Automatic speech recognition: A deep learning approach. London: Springer.

Zaretskaya, Anna, Gloria Corpas Pastor, and Miriam Seghiri (2015): “Translators’ requirements for translation technologies: a user survey”, New Horizons in Translation and Interpreting Studies (Full papers), Geneva, Switzerland: Tradulex, 247–254.

Descargas

Publicado

2020-12-22

Cómo citar

Gaber, M., Corpas Pastor, G., & Omer, A. (2020). Speech-to-Text Technology as a Documentation Tool for Interpreters: a new approach to compiling an ad hoc corpus and extracting terminology from video-recorded speeches . TRANS: Revista De Traductología, (24), 263–281. https://doi.org/10.24310/TRANS.2020.v0i24.7876

Número

Sección

Miscelánea_borrar