Ben Klang wrote:
Yes, that’s exactly what we found. It’s good to know that
res_speech internally isn’t as limited as the Dialplan applications -
I definitely thought of them as the same thing in my head, which
sounds incorrect from your explanation.
Can res_speech be extended to include TTS as well as ASR, assuming
both are controllable via MRCP?
If so, what about other MRCP functions like Call Progress Analysis or
Answering Machine Detection?
There are no interfaces or anything defined in Asterisk for these, so
it's new stuff being added. Same caveats apply like everything new. ^_^
CPA/AMD in particular behaves like ASR, and has similar variables (no
input timer, final silence timer, can take a grammar document for
During lunch though I gave this some more thought and think that
speech recognition should always be a passive action on a channel
(or heck, a bridge). It would sit in the media path feeding stuff
to the speech recognition and raising events but does not block.
This would allow it to easily cooperate with every other possible
thing in ARI without requiring a developer to use a Snoop channel
and manage it. It also doesn't put the "well if they start speaking
what do I do" logic inside of Asterisk - it gives that power to the
Yes, that sounds great. Async FTW.
One observation to share: We often use something like SynthAndRecog
(unimrcp-asterisk’s dialplan implementation) to handle both input and
output in a single command. This allows prompts to be “barged”, or
interrupted by speech or DTMF. What happens is that the speech
recognizer is running while the synthesizer is playing back. When the
caller speaks, it raises an MRCP event, which UniMRCP uses as a
trigger to stop TTS playback. This works well enough, though
occasionally the delay between start-of-speech and TTS getting hugged
can be noticeable.
What you’re proposing would mean letting the application stop TTS
playback in response to a start-of-speech event. In our experience
applications can get loaded down and delay those responses even more.
Even in a best-case scenario, the latency for the application
handling this kind of request would be significantly more than doing
it inside of Asterisk. Since this is a very timing-critical operation
(milliseconds count, as a human will pick up on the delay), it might
be good to have an option that combines input with output for the
purpose of barge.
To borrow an example from a similar protocol: Rayo handles this by
allowing all three kinds of commands: Input (for ASR or DTMF), Output
(for TTS or audio file), and Prompt (for a combined Input + Output,
where the Output is linked to stop on a start-of-input event). All 3
actions are async, raising the appropriate events as things happen.
I fear doing the Prompt case that we then have to somehow jury rig
things to use the existing playback mechanism (to allow current and
future URI schemes) but allow it to be influenced more as a result of
the events. That's why I hesitated.
If we could come up with a clean way to do that, yeah.
Digium, Inc. | Senior Software Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - US
Check us out at: www.digium.com & www.asterisk.org
asterisk-app-dev mailing list