A comparative study of ASR performance in different emotional states
|Authors:||Theologos Athanaselis; Stylianos Bakamidis; Ioannis Dologlou|
|Book title:||Proceedings of the XXVIII-th International Congress of Audiology|
Emotional speech recognition rates reported in the literature are in fact low. Although knowledge in the area has been developing rapidly, it is still limited in fundamental ways. The first issue concerns that not much of the spectrum of emotionally coloured expressions has been studied. The second issue is that most research on speech and emotion has focused on recognising the emotion being expressed and not on the classic Automatic Speech Recognition (ASR) problem of recovering the verbal content of the speech. Read speech and non-read speech in a ‘careful’ style can be recognized with accuracy higher than 95% using the state-of-the-art speech recognition technology. This paper investigates the recognition performance using spontaneous emotionally colored speech, produced by people holding conversations with Sensitive Artificial Listener (SAL), a simulated ‘chatbot’ system.