On 13 Jul, Hynek Hanke <hanke at brailcom.org> wrote: > It would be great if somebody who thinks that Festival > is actually worse than eSpeak in quality of speech > could try to elaborate more about the reasons. It depends what you mean by "quality". There is no doubt that the good Festival voices sound more human than eSpeak. I'm not blind, but I use text-to-speech a lot for reading blogs, news articles, etc. The main reasons why I prefer to listen to eSpeak rather than Festival are: 1. Clarity. The eSpeak voice (I use British English) sounds more clear, and sharp, and more articulated. An alternative description might be "artificial and harsh". The perceived quality of eSpeak may depend on your loudspeakers. I use a domestic sound system with big speakers and it sounds good to me. But eSpeak has less "bass" and more mid-frequencies than other synthesizers, and perhaps that's less suitable for small computer speakers where it sounds more "harsh"? People have experimented with new eSpeak "voice variants" with changes to the "tone" and "formant" parameters to change the tonal balance. 2. Intonation (the changes in pitch during a sentence). Festival seems more "flat" or "boring". I prefer eSpeak's more lively intonation (although that may not sound good for some languages). Perhaps it's possible to make a new improved intonation algorithm in Festival. Note that you can use eSpeak as a front-end to a Mbrola diphone voice, so you get eSpeak's intonation with a more natural sounding voice (intonation with Mbrola was improved in eSpeak version 1.31 and later). http://espeak.sf.net/mbrola.html. Try comparing Festival with eSpeak+Mbrola. > This is why eSpeak is the current default in Speech Dispatcher > because it is initially easier to get running and it covers a great > span of languages. The documentation however strongly suggest > users whose language is supported by Festival to try it as their > primary syntesizer for a better voice quality. That is good advice, especially since the quality of different languages in eSpeak is very variable.