Miku Nishihara1*, Dan Wells2*, Korin Richmond2, Aidan Pine3
1Department of Computer Science, Nagoya Institute of Technology, Nagoya, Japan
2The Centre for Speech Technology Research, University of Edinburgh, United Kingdom
3National Research Council Canada, Canada
*Equal contribution
Global style tokens (GSTs) allow for rich modelling of the variation in a speech corpus and subsequent control of text-to-speech synthesis (TTS). However, certain styles of speech may be marked by variation along multiple dimensions, complicating the interpretation and control of learned style tokens. One example is hyperarticulated or `clear' speech, for example as directed toward listeners with hearing impairments or language learners in the classroom, which in English is characterised by reduced speaking rate, increased F0, more careful articulation of vowels and plosive consonants, and other factors. We present a method for simplifying control of style tokens by applying principal components analysis (PCA) to GST weights from a TTS system trained on both plain and clear speech. We identify the axes of variation in PCA space with the acoustic correlates of clear speech in English and show that we can synthesise either style by moving along a single dimension in that space.
Style | Settings for synthesis | British officials said they could not say. | Who is running football in this country? | I want to stay as normal as possible. |
---|---|---|---|---|
Plain | ||||
P Natural | ||||
P 1-hot | ||||
P GST mean | ||||
P GST +3 | ||||
Clear | ||||
C Natural | ||||
C 1-hot | ||||
C GST mean | ||||
C GST -3 |
C Natural and P Natural (gray colored rows) are not included in the style controllabillity experiment in the paper.
Style | Settings for synthesis | There was pressure from elsewhere too. | This was very important to me. | But what can you do? |
---|---|---|---|---|
Clear | ||||
C Natural | ||||
C GST -3 | ||||
C GST -2 | ||||
C GST mean | ||||
C GST -1 | ||||
Neutral | GST 0 | |||
Plain | ||||
P GST +1 | ||||
P GST mean | ||||
P GST +2 | ||||
P GST +3 | ||||
P Natural |