using limited domain synthesizers

The goal of building such limited domain synthesizer is not just to show off good synthesis. We followed this route as we see this as a very practical method for building speech output systems.

For practical reasons, the default configures includes the possibility of a back-up voice that will be called to do synthesis if the limited domain synthesizer fails, which for this default setup means the phrase includes a out of vocabulary word. It would perhaps be more useful if the fall position just required synthesis of the out of vocabulary word itself rather than the whole phrase, but that isn't as trivial as it might be. The limit domain synthesis does not prosody modification of the selections, except for pitch smooth at joins, thus slotting in a diphone one word would sound very bad. At present each limited domain synthesizer has an explicitly defined closest_voice. This voice is used when the limited domain synthesis fails and also when generating the prompts, which can be looked upon as absolute minimal case when the synthesizer has no data to synthesize from.

There are also issues in speed here, which we are still trying to improve. This technique should in fact be fast but it is still slower than our diphone synthesizer. One significant reason is the cost if finding the optimal join put in selected units. Also this synthesizer technique require more memory that diphones as the cepstrum parameters for the whole database are required at run time, in addition to the full waveforms. These issues we feel can and should be addressed as these techniques are not fundamentally computationally expensive so we intend to work on these aspect in later releases.