Linux speech w/o special hardware

blinux-list@redhat.com (James R. Van Zandt) · Sun, 24 Mar 2002 21:17:01 -0500 (EST)

In my archives, I found these two messages on emacspeak servers for
Festival.

	    - Jim Van Zandt

>From blinux-list@redhat.com  Tue Nov 10 20:09:14 1998
Return-Path: <blinux-list@redhat.com>
Resent-Cc: recipient.list.not.shown:;@
MBOX-Line: From blinux-list-request@redhat.com  Tue Nov 10 04:29:19 1998
X-Sender: brsmart@pop.mindspring.com
Date: Mon, 09 Nov 1998 05:17:19 -0500
To: blinux-list@redhat.com
From: Bryan Smart <bsmart@pobox.com>
Subject: Re: speech daemon for linux?
In-Reply-To: <3647F625.463D05BD@usa.net>
Content-Type: text/plain; charset="us-ascii"
Resent-From: blinux-list@redhat.com
Reply-To: blinux-list@redhat.com
X-Mailing-List: <blinux-list@redhat.com> archive/latest/121
X-Loop: blinux-list@redhat.com
Precedence: list
Resent-Sender: blinux-list-request@redhat.com

I am also disappointed in the current state of software synthesis.
Programs like Festival and Mbrola produce some of the most realistic
sounding speech that I've heard from a computer.  They are, however, too
slow and slugish to be useful to users of programs like Emacspeak.  I've
been trying to get *some* kind of software synth to be usable enough with
Emacspeak to where I could stop lugging my DecTalk along with my laptop,
but have had little luck.

Bart's Mbrola server is reasonably responsive, but Mbrola has many problems
that cause it to hang on certain phraises.  Other problems with his server
cause seemingly random core dumps.  Further, the Mbrola voice does not seem
to be as responsive as the DecTalk.

I wrote a server that used the Festival speech system, and, while that one
was considerably more stable than Bart's server, the Festival system is far
slower than Mbrola.  No chance of zippping along with Festival as you might
with your DecTalk.

Lastly, I have hacked together a server that uses Rsynth.  Rsynth is older
and takes far less CPU power than either of the two systems already
mentioned.  Rsynth is still far from perfect.  The code could work far
better than it does.  It inserts long pauses between words and pads the
output (both of these detract from its speed).

I've been working to streamline Rsynth in the hopes that it can perform at
least as well as one of the 80's hardware synths.  I will keep the list
posted.

I am disappointed that more oldschool code is not available on this topic.
The old Echo II I used on my Apple IIe back in 1985 was a simple piece of
hardware that was both responsive and understandable (after a time).  The
circuitry on the card could not have been complicated (street price was
$130 back then), and I wonder why more software modeling of these earlier
synths has not been done.  Does anyone know of such code?

With out a software DecTalk for Linux and only high-end diphone
synthesizers available, we may be in for a long wait with regards to usable
software synthesis.

Bryan

--
Bryan R. Smart                     Home Phone: 843-953-2721
System Administrator               DCS Mobal: 843-814-7627
Department of Computer Science     E-Mail: bsmart@pobox.com
The College of Charleston          Home Page: http://www.pobox.com/~bsmart

---
Send your message for blinux-list to blinux-list@redhat.com
Blinux software archive at ftp://leb.net/pub/blinux
Blinux web page at http://leb.net/blinux
To unsubscribe send mail to blinux-list-request@redhat.com
with subject line: unsubscribe

>From emacspeak-request@cs.vassar.edu  Thu Feb 10 22:03:08 2000
MIME-Version: 1.0 (generated by WEMI 1.13.7 - "Shimada")
Date: Wed, 09 Feb 2000 15:08:32 +0100
Resent-from: emacspeak@cs.vassar.edu
From: Mario Lang <mlang@home.delysid.org>
Subject: Mapping DECtalk commandset to SABLE
Resent-sender: emacspeak-request@cs.vassar.edu
To: emacspeak@cs.vassar.edu
Content-type: text/plain; charset=US-ASCII
Precedence: list
Old-Return-Path: <mlang@home.delysid.org>
User-Agent: WEMI/1.13.7 (Shimada) FLIM/1.13.2 (Kasanui) Emacs/20.5
 (i386-debian-linux-gnu) MULE/4.0 (HANANOEN)
X-Mailing-List: <emacspeak@cs.vassar.edu> archive/latest/1106
X-Loop: emacspeak@cs.vassar.edu

Hello.

As some off you probably know, I am in the process of
writing a Emacspeak Server for Festival.
I am currently multithreading the whole thing to be able
to eliminate commands which wouldn't get spoken (sch as fast scrolling with many q d s q d s q d s series)...
When this is finished (I hope soon, its quite a brainer for me),
the next step will be the voice-lock mode.
I plan to convert the strings emacspeak sends to the speech server
to a SABLE-mark-up text for Festival.

First of all, is this the right way? Or does anyone think that
implementing the SABLE commands in a festival-voices.el would be better?

Ok, if q1 is no, than I seek for help:
I'd like to talk with someone how to map the DECtalk commandset
with all his parameters to apropriate SABLE commands and their parameters.
This can only be well done from the ground up.
I am absolutly not familiar with DECtalk and dont know what those
many parameters (the numbers) mean. Didnt find any good specs anyway..
Something to think about:
SABLE supports: <SPEAKER></SPEAKER>, <RATE></RATE>, <SAYAS></SAYAS>, <AUDIO SRC/>
Thats the main commands to map to.
The speech rate has to be in the range of 1 to 99%.
Speakers are speaker names.
SAYAS can be used for pronounciation and so on.
AUDIO SRC can be used to include Wavforms directly into the spoken text.

Just got another question:
Does anyone know how to reconfigure Emacspeak so that it dosent use
the play program for Auditory Icons. It should use the a and p commands 
and send Auditory Icons to the Speech server.
This would eliminates the "two-soundcards" problem.

And the last question mainly to raman:
Are you planning to implement a pronounce-word command for the speech servers?
When I tell Emacspeak to read the characters of a word it would send:
l {x}
l {y}
l {z}
and so on.
My speech server would send the following to festival:
(tts "<SABLE><SAYAS MODE="literal">x</SAYAS> </SABLE>" 'sable)
(tts "<SABLE><SAYAS MODE="literal">y</SAYAS> </SABLE>" 'sable)
(tts "<SABLE><SAYAS MODE="literal">z</SAYAS> </SABLE>" 'sable)

But if Emacspeak would send a special command for speaking literals of a word, I could convert to:
(tts "<SABLE><SAYAS MODE="literal">xyz</SAYAS> </SABLE>" 'sable)

which would recude overhead alot.

Regards,
       Mario Lang <lang@zid.tu-graz.ac.at>

-----------------------------------------------------------------------------
       To unsubscribe or change your address send mail to
"emacspeak-request@cs.vassar.edu" with a subject of "unsubscribe" or "help"