CAPTivating: Comparative Analysis of Public speaking with Text-to-speech
Captivating an audience means attracting and holding the listeners' attention by being very interesting, exciting or pleasant. Thanks to a combination of a paradigm shift in the area of speech synthesis, and some of our own research group's achievements enabling us to leverage real-life speech data, it is now possible to mimic a captivating speaker's characteristics using realistic sounding TTS. At Interspeech 2019, our paper "Off the Cuff: Exploring Extemporaneous Speech Delivery with TTS" received the best Demo Award. It demonstrated the capabilities of one of our spontaneous synthetic voices, via an interactive interface navigating through different versions of resynthesized utterances from two keynote speeches. The aim of this project is to employ this tool for research in the area of linguistics and speech analysis, specifically to study public speaking. The proposed method aims to use comparative perceptual experiments with spontaneous speech synthesis to be able to systematically vary various speech features and measure the direct and combined perceptual impact. We will control breathing, vocal effort, prosody and hesitations in our TTS, in order to study their effect on listeners' perception, memory, recall and cognitive load, through multimodal sensors. Lastly, we will compare and contrast the impact of these variations in public speaking between Swedish and English, and to make this possible, we will create the first TTS built from Swedish spontaneous speech.