eSpeak - Introduction

jsd@xxxxxxxxxxx (Jonathan Duddington) · Thu, 13 Apr 2006 11:43:54 +0100

In article <Pine.LNX.4.62.0604130852320.28756 at localhost.localdomain>,
   Willem van der Walt <wvdwalt at csir.co.za> wrote:

> I think there is a bug in the espeak program when reading long text
> files. To me it is no problem as I am using speech-dispatcher which
> sends smaller chunks of text at a time, but others might be using
> the file feature. The program segfaults after a time.

That's puzzling, and worrying.  I often speak large text files with
  speak -f textile

and I've not had any problem like that. Were you using any other
options?

> I am interested in the process of creating a new language using
> espeak. Where can I get more detail on that?

Firstly read the Documents section from the web site
(http://espeak.sourceforge.net): Dictionary, Phonemes, Phoneme Tables,
(and download phonsource.zip referenced from there).

There is another program which I haven't released yet which compiles
the Phoneme Tables, together with sound recordings of consonants and
formant specifications of vowels.  It also includes an editor for the
vowels.  The interface needs tidying up a bit, but the biggest job is
writing user instructions so that others can use it.  I hope to do this
though.  If you want to try it out without instructions, I could make
it available fairly soon :-)

It would be very interesting if someone did do another language
implementation.  It would help to identify language-dependent features.
Depending on which language, I might need to add some new features to
the speech engine.

Firstly you need to get a phonological description of your language
(eg. what phonemes it uses).  Looking up "yourlanguage language" in
wikipedia might give some useful information.

It may be that, as a first approximation, you can use already provided
phonemes from the list in the Phonemes.html document.  You can try out
example words in your new language by giving phoneme codes enclosed
within double square brackets, eg:
   speak "[h at l'oU w'3:ld]]"

would say "hello world" in English,
   speak "[g'yt at n t'A:g]]"

would say "g?ten tag" in German, using the [y] phoneme, which isn't
used in English, but which is already provided in eSpeak.

Perhaps you can find a set of passable phonemes for your language (you
can implement more accurate versions later).  A Bantu language would be
more of a challenge (eg. tonal language, click consonants).

Then you can start constructing pronunciation rules in the
<language>_rules file. The <language>_list file gives exceptions and
also those common words which are usually unstressed ("the", "is",
"in", etc).  See the "data/" directory in eSpeak's "source.zip"
download package for examples.  Hopefully your language's spelling
rules won't be as difficult as English!

Set up a Voice in espeak-data/voices  for your language (specifying
your language's dictionary, but keeping the default phoneme set for
now) and compile the dictionary files with
  speak --compile=yourvoice

That should give you a very rudimentary implementation of your
language.  It might be intelligible :-)

eSpeak is written in C++.  You can write a new Translator class for
your language which can keep the functions of the base Translator
class, or can set options to vary their effect (eg. which syllable of a
word usually takes the main stress, or the difference in length between
stressed and unstressed syllables), or can override them with
replacement functions (eg. a new sentence intonation algorithm).  Now
your language should be sounding better, as you listen to it speaking,
notice problems, make adjustments to rules, phoneme realizations, and
various tuning parameters.

If you're serious about implementing a language, then I'll be happy to
help with support, program features, information and documentation.