La tecnología habla-texto como herramienta de documentación para intérpretes: Nuevo método para compilar un corpus ad hoc y extraer terminología a partir de discursos orales en vídeo
DOI:
https://doi.org/10.24310/TRANS.2020.v0i24.7876Keywords:
Speech-to-Text, computer-aided interpreting tools, terminology extraction, automatic speech recognition, ad hoc corpus, interpreting technologies.Abstract
Although interpreting has not yet benefited from technology as much as its sister field, translation, interest in developing tailor-made solutions for interpreters has risen sharply in recent years. In particular, Automatic Speech Recognition (ASR) is being used as a central component of Computer-Assisted Interpreting (CAI) tools, either bundled or standalone. This study pursues three main aims: (i) to establish the most suitable ASR application for building ad hoc corpora by comparing several ASR tools and assessing their performance; (ii) to use ASR in order to extract terminology from the transcriptions obtained from video-recorded speeches, in this case talks on climate change and adaptation; and (iii) to promote the adoption of ASR as a new documentation tool among interpreters. To the best of our knowledge, this is one of the first studies to explore the possibility of Speech-to-Text (S2T) technology for meeting the preparatory needs of interpreters as regards terminology and background/domain knowledge.
Downloads
Metrics
References
Ali, Ahmed, and Steve Renals (2018): “Word Error Rate Estimation for Speech Recognition: e-WER”, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Short Papers), Melbourne, Australia, 20–24.
Arce Romeral, Lorena and Míriam Seghiri (2018): “Booth-friendly term extraction methodology based on parallel corpora for training medical interpreters”, Current Trends in Translation Teaching and Learning E, 5, 1-46.
Bale, Richard (2013): “Undergraduate Consecutive Interpreting and Lexical Knowledge”, The Interpreter and Translator Trainer, 7/1, 27-50.
Blanchard, Nathaniel, Michael Brady, Andrew M. Olney, Marci Glaus, Xiaovi Sun, Martin Nystrand, Borhan Samei, Sean Kelly, and Sidney D’Mello (2015): “A study of automatic speech recognition in noisy classroom environments for automated dialogue analysis”, International Conference on Artificial Intelligence in Education, Springer, Cham, 23–33.
Brownlee, Johnson (2017): “A Gentle Introduction to Calculating the BLEU Score for Text in Python”, Machine Learning Mastery, 20th Nov <https://machinelearningmastery.com/> [Accessed: 04-VI-2020].
Cheung, Andrew KF, and Li Tianyun (2018): “Automatic speech recognition in simultaneous interpreting: A new approach to computer-aided interpreting”, Proceedings of Ewha Research Institute for Translation Studies International Conference, At Ewha Womans University.
Condon, Sherri L., Jon Phillips, Christy Doran, John S. Aberdeen, Dan Parvaz, Beatrice T. Oshika, Gregory A. Standers, and Craig Schlenoff (2008): “Applying Automated Metrics to Speech Translation Dialogs”, Proceedings of LREC-May 2008.
Corpas Pastor, Gloria and Isabel Durán-Muñoz (eds.) (2017): Trends in E-tools and Resources for Translators and Interpreters, Leiden, Holland: Brill | Rodopi. https://doi.org/10.1163/9789004351790.
Corpas Pastor, Gloria (2018): “Tools for Interpreters: The Challenges that Lie Ahead”, Current Trends in Translation Teaching and Learning E, 5, 157-182.
Costa, Hernani, Gloria Corpas Pastor, and Isabel Durán Muñoz (2014): “Technology-assisted Interpreting”, MultiLingual, 143/25, 27-32.
Daille, Béatrice (2012): “Building bilingual terminologies from comparable corpora: The TTC TermSuite”, Proceeding of 5th Workshop on Building and Using Comparable Corpora at LREC 2012.
Deng, Li, and Douglas O’Shaughnessy (2003): Speech Processing: A Dynamic and Optimization-Oriented Approach, New York: Marcel Dekker Inc.
Deriu, Jan, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, and Mark Cieliebak (2019): “Survey on evaluation methods for dialogue systems”, arXiv preprint arXiv:1905.04071, 1, 1-62 <https://arxiv.org/abs/1905.04071> [Accessed: 04-VI-2020].
Desmet, Bart, Mieke Vandierendonck, and Bart Defrancq (2018): “Simultaneous interpretation of numbers and the impact of technological support”, in Claudio Fantinuoli (ed.), Interpreting and technology, Berlin: Language Science Press, 13–27, doi:10.5281/zenodo.1493291.
Drouin, Patrick (2003): “Term extraction using nontechnical corpora as a point of leverage”, Terminology, 9/1, 99-115.
Durán Muñoz, Isabel, Gloria Corpas Pastor, Le An Ha, and Ruslan Mitkov (2015): “Introducing ProTermino: A New Tool Aimed at Translators and Terminologists”, Traducimos desde el Sur, Actas del VI Congreso Internacional de la Asociación Ibérica de Estudios de Traducción e Interpretación. Las Palmas de Gran Canaria, 23-25 de enero de 2013, Las Palmas de Gran Canaria: Universidad de Las Palmas de Gran Canaria, Servicio de Publicaciones y Difusión Científica, 623-638.
Fantinuoli, Claudio (2006): “Specialized Corpora from the Web for Simultaneous Interpreters”, in Marco Baroni and Silvia Bernardini (eds.), Wacky! Working papers on the Web as Corpus, Bologna: GEDIT, 173-190.
Fantinuoli, Claudio (2016): “InterpretBank: Redefining computer-assisted interpreting tools”, Proceedings of the 38th Conference Translating and the Computer, 42–52.
Fantinuoli, Claudio (2017a): “Computer-assisted preparation in conference interpreting”, Translation & Interpreting 9/2, 24-37.
Fantinuoli, Claudio (2017b): “Speech recognition in the interpreter workstation”, Proceedings of the Translating and the Computer 39 Conference, London: Editions Tradulex, 367–377.
Fantinuoli, Claudio and Bianca Prandi (2018): “Teaching information and communication technologies: a proposal for the interpreting classroom”, Transkom 11/2, 162-182.
Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima (2000): “Automatic recognition of multi-word terms”, International Journal of Digital Libraries, 3/2, 117-132.
Gallego Hernández, Daniel and Miguel Tolosa (2012): “Terminología bilingüe y documentación ad hoc para intérpretes de conferencias. Una aproximación metodológica basada en corpus”, Estudios de Traducción, 2, 33-46.
Gile, Daniel (1990): “Scientific research vs. personal theories in the investigation of interpretation”, in Laura Gran and Christopher Taylor (eds.), Aspects of applied and experimental research on conference interpretation, Udine: Campanotto Editore, 28-41.
González, María, Julián Moreno, José Luis Martínez, and Paloma Martínez (2011): “An illustrated methodology for evaluating ASR systems”, International Workshop on Adaptive Multimedia Retrieval, Berlin: Springer, 33-42.
He, Xiaodong, Li Deng, and Alex Acero (2011): “Why word error rate is not a good metric for speech recognizer training for the speech translation task?”, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 5632-5635.
Kalina, Sylvia (2000): “Interpreting competences as a basis and a goal for teaching”, The Interpreters’ Newsletter, 10, 3–32.
Khandelwal, Renu (2020): “Blue- Bilingual Evaluation Understudy: A step by setp approach to understanding BLEU, a metric to understand the effectiveness of Machine Translation (MT)”, Towards Data Science, 25th Jan <https://towardsdatascience.com/bleu-bilingual-evaluation-understudy-2b4eab9bcfd1> [Accessed: 04-VI-2020].
Kolár, Jáchym and Lori Lamel (2012): “Development and Evaluation of Automatic Punctuation for French and English Speech-to-Text”, Thirteenth Annual Conference of the International Speech Communication Association.
Lam, Adriane R., Jennifer E. Bauer, Susanna Fraass, Sarah Sheffield, Maggie R. Limbeck, Rose M. Borden, Megan E. Thompson-Munson (2019): “Time Scavengers: An Educational Website to Communicate Climate Change and Evolutionary Theory to the Public through Blogs, Web Pages, and Social Media Platforms”, Journal of STEM Outreach, 7/2, 1-8.
Oliver, Antoni, and Mercè Vázquez (2007): “A Free Terminology Extraction Suite”, Proceeding of Translating and the Computer 29th Conference, London.
Oliver, Antoni, Mercè Vázquez, and Joaquim Moré (2007): “Linguoc LexTerm: una herramienta de extracción automática de terminología gratuita”, Translation Journal, 4/11.
Peitz, Stephan, Markus Freitag, Arne Mauser, and Hermann Ney (2011): “Modeling Punctuation Prediction as Machine Translation”, International Workshop on Spoken Language Translation (IWSLT), 238-245.
Papineni, Kishore, Salim Roukos, Todd Ward, and WeiJing Zhu (2002): “BLEU: A Method for Automatic Evaluation of Machine Translation”, Proceedings of the 40th annual meeting on association for computational linguistics, Association for Computational Linguistics, 311-318.
Pérez-Pérez, Pablo (2018): “The Use of a Corpus Management Tool for the Preparation of Interpreting Assignments: A Case Study”, The International Journal for Translation and Interpreting Research, 10/1 137-151.
Pöchhacker, Franz (2016): Introducing Interpreting Studies, Abingdon: Routledge, 2nd edition.
Quintas, Laura Cacheiro (2017): “Towards a Hybrid Intralinguistic Subtitling Tool: Miro Translate”, in J. Esteves-Ferreira, J. Macan, R. Mitkov and O.-M. Stefanov (eds.), Proceedings of the 39th Conference Translating and the Computer. ASLING, London, UK, November 16-17, 2017, Geneva: Tradulex, 01-06.
Sánchez Ramos, María del Mar (2017): “Interpretación sanitaria y herramientas informáticas de traducción: los sistemas de gestión de corpus”, Panace@, 18/46, 133-141.
Sandrelli, Annalisa, and Jesus De Manuel Jerez (2007): “The Impact of Information and Communication Technology on Interpreter Training: State-of-the-Art and Future Prospects”, The Interpreter and Translator Trainer 1/2, 269-303, doi:10.1080/1750399X.2007.10798761.
Setton, Robin (1999): Simultaneous interpretation: A cognitive-pragmatic analysis, (vol. 28), Amsterdam: John Benjamins Publishing.
Stinson, Micheal S., Sandy Eisenberg, Christy Horn, Judy Larson, Harry Levitt, and Ross Stuckless (1999): “Real-time speech-to-text services”, in Ross Stuckless (ed.), Reports of the National Task Force on Quality Services in Postsecondary Education of Deaf and Hard of Hearing Students. Rochester, NY: Northeast Technical Assistance Center, Rochester Institute of Technology.
Vivaldi, Jorge (2001): Extracción de candidatos a término mediante la combinación de estrategias heterogéneas, Ph.D dissertation, Barcelona: Universidad Politécnica de Catalunya.
Vogler, Nikolai, Craig Stewart, and Graham Neubig (2019): “Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation”, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, 109–118, doi: 10.18653/v1/N19-1010.
Weiss, Ron J., Jan Chorowski, Navdeep Jaitly, Yonghui Wu, and Zhifeng Chen (2017): “Sequence-to-Sequence Models Can Directly Translate Foreign Speech”, Proceeding of Interspeech, doi: 10.21437/Interspeech, 2017-503.
Xu, Ran (2015): Terminology preparation for simultaneous interpreters, Ph.D dissertation, Leeds: University of Leeds.
Xu, Ran (2018): “Corpus-based terminological preparation for simultaneous interpreting”, Interpreting, 20/1, 33–62.
Yu, Dong, and Li Deng. (2015): Automatic speech recognition: A deep learning approach. London: Springer.
Zaretskaya, Anna, Gloria Corpas Pastor, and Miriam Seghiri (2015): “Translators’ requirements for translation technologies: a user survey”, New Horizons in Translation and Interpreting Studies (Full papers), Geneva, Switzerland: Tradulex, 247–254.
Downloads
Published
How to Cite
Issue
Section
License
All contents published in TRANS. Revista de Traductología are protected under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license. All about this license is available in the following link: <http://creativecommons.org/licenses/by-nc-sa/4.0>
Users can copy, use, redistribute, share and exhibit publicly as long as:
- The original source and authorship of the material are cited (Journal, Publisher and URL of the work).
- It is not used for comercial purposes.
- The existence of the license and its especifications are mentioned.
- ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
There are two sets of authors’ rights: moral and property rights. Moral rights are perpetual prerogatives, unrenounceable, not-transferable, unalienable, imprescriptible and inembargable. According to authors’ rights legislation, TRANS. Revista de Traductología recognizes and respects authors moral rights, as well as the ownership of property rights, which will be transferred to University of Malaga in open access.
The property rights are referred to the benefits that are gained by the use or the dissemination of works. TRANS. Revista de Traductología is published in an open access form and it is exclusively licenced by any means for doing or authorising distribution, dissemination, reproduction, , adaptation, translation or arrangement of works.
Authors are responsable for obtaining the necessary permission to use copyrighted images.