Speech synthesis based on gaussian conditional random fields

Please enable javascript in your browser.

Khorram, S ; Sharif University of Technology

592 Viewed

Type of Document: Article
DOI: 10.1007/978-3-319-10849-0_19
Abstract:
Hidden Markov Model (HMM)-based synthesis (HTS) has recently been confirmed to be the most effective method in generating natural speech. However, it lacks adequate context generalization when the training data is limited. As a solution, current study provides a new context-dependent speech modeling framework based on the Gaussian Conditional Random Field (GCRF) theory. By applying this model, an innovative speech synthesis system has been developed which can be viewed as an extension of Context-Dependent Hidden Semi Markov Model (CD-HSMM). A novel Viterbi decoder along with a stochastic gradient ascent algorithm was applied to train model parameters. Also, a fast and efficient parameter generation algorithm was derived for the synthesis part. Experimental results using objective and subjective criteria have shown that the proposed system outperforms HSMM substantially in limited speech databases. Moreover, Mel-cepstral distance of the spectral parameters has been reduced considerably for any size of training database
Keywords:
Gaussian conditional random field ; HSMM extension ; Statistical parametric speech synthesis ; Conditional random field ; Gaussians
Source: Communications in Computer and Information Science ; Vol. 427, issue , 2014 , p. 183-193
URL: http://link.springer.com/chapter/10.1007%2F978-3-319-10849-0_19