During the last decade, speech synthesis technologies have evolved from concatenative unit-selection techniques to statistical parametric ones. Nowadays, the most relevant statistical technique is HMM-based speech synthesis, where hidden Markov models are used to model phonetic units and an estimation method, such as maximum likelihood, is used as criterion for acoustic model training and parameter generation. The main advantage of this technique is its enormous flexibility for modifying the speech characteristics by applying adaptation techniques. In addition to the speaker identity modification, this flexibility allows obtaining synthetic voices with different speaking styles, emotions, etc. without a costly process of recording new speakers or speaking styles. Other benefits are a low memory footprint and a more consistent synthetic speech quality. New applications like personalized speech-to-speech translation or voice reconstruction for speech impaired people are now possible thanks to the emergence of statistical parametric synthesis. This presentation intends to give a brief overview of the HMM-based speech synthesis and also to show some work carried out in the field of speaker adaptation.
Carmen Magariños received her degree in Telecommunications Engineering in 2011, and her MSc in Signal Theory and Communications in 2014, both from the University of Vigo (Spain). In 2011 she joined the Multimedia Technology Group (GTM) where she has worked as a research engineer on speech synthesis related issues. Currently she continues at the GTM as a PhD student under the research project SpeechTech4All. Her research interests are focused on speech technology, mainly on HMM-based speech synthesis, hybrid models and speaker adaptation.