eSpeak - American English

jsd@xxxxxxxxxxx (Jonathan Duddington) · Fri, 12 May 2006 23:10:39 +0100

In article <20060512204050.75077.qmail at web38611.mail.mud.yahoo.com>,
   Arthur Pirika <arfy8820 at yahoo.com.au> wrote:
> Well, thanks to those who helped out here -- working
> perfectly now! I'd be interested if slackware's the
> only distro affected by this, or do others have to add
> -lphread to the make file? Gentoo/fedora/debian users?

eSpeak doesn't use pthread so it shouldn't need to link to it.

It seems that the portaudio sound interface does use it, but I don't
see why that should affect the compilation of eSpeak, unless you are
not using portaudio as a shared library, but are instead combining it
into the eSpeak binary as a static library.  Perhaps that's the
difference.

I use a Debian based system.  The "libportaudio0" package contains the
portaudio shared library.  It has a dependency on the "libc6" package,
which is the standard GNU Shared C Library, which includes pthread as
one of its components.

> Also, now that eSpeaks going through it's paces, I'm curious as to
> where the samples, phonem data etc came from?

>From my mouth :-)

Unvoiced consonants (eg. [t] [s] [f]) are simply recorded sound samples.

The vowels and sonorant consonants (eg. [n] [w] [l]) are generated at
run-time from formant details (peaks in the frequency spectrum).

Voiced consonants (eg. [d] [z] [v]) are a combination of both these
methods.

There is some information at http://espeak.sourceforge.net/docindex.html

> Would it be possible to either implement an american english voice
> using the existing data or create one in the future? Just some
> thoughts.

Yes, but I don't speak American so I'll leave that to someone else.

You'd need to make some adjustments to the vowels and the relative
lengths of phonemes and syllables, but a bigger problem (given the
complexity of English pronunciation rules and exceptions) would be the
different spelling to phoneme translation.

eg:          British       American
  cart       [kA:t]        [kA:rt]
  here       [hI@]         [hIr] or perhaps [hI at r]
  city       [sItI]        [sI*i]
                      (where [*] is a sort of degenerate [d] sound)

The [*] phoneme is not currently present in eSpeak, and post-vocalic
[r] (i.e. not following a vowel) needs some work.

Adjusting vowels and adding new phonemes will need my vowel-editing and
phoneme data compilation program, which I haven't released yet. It
needs some tidying up, and more significantly, instructions on how to
use it!  But if anyone is seriously interested, let me know.