On Sep 5, 2014, at 1:29 PM, Joshua Colp <jcolp@xxxxxxxxxx> wrote: > Ben Klang wrote: > > <snip> > >> >> Is it really required to use res_speech? If so, can we change the >> interfaces that ARI presents? >> >> Over the last few years we’ve evaluated res_speech vs. the various >> UniMRCP applications (SynthAndRecog primarily). We’ve always come to the >> conclusion that the res_speech API either couldn’t give us what we >> needed, or was not as performant. SynthAndRecog isn’t perfect, but it >> does a couple of crucial things, perhaps most importantly is the >> combined lifecycle of TTS + ASR so that you can “barge” into a TTS >> playback before it is finished. > > The res_speech module and API is a very thin wrapper over common speech recognition concepts. It does some helpful stuff like handling transcoding and having a state machine but otherwise it relies on the underlying speech technology to do everything. It doesn't provide anything to the dialplan 'nor does it even know about channels. > > What you probably found limiting was the interface provided to the dialplan/AGI for speech recognition, with the dialplan applications taking care of things. These wouldn't get used in ARI. We're free to make the interface there whatever we want. > Yes, that’s exactly what we found. It’s good to know that res_speech internally isn’t as limited as the Dialplan applications - I definitely thought of them as the same thing in my head, which sounds incorrect from your explanation. Can res_speech be extended to include TTS as well as ASR, assuming both are controllable via MRCP? If so, what about other MRCP functions like Call Progress Analysis or Answering Machine Detection? CPA/AMD in particular behaves like ASR, and has similar variables (no input timer, final silence timer, can take a grammar document for input, etc). > During lunch though I gave this some more thought and think that speech recognition should always be a passive action on a channel (or heck, a bridge). It would sit in the media path feeding stuff to the speech recognition and raising events but does not block. This would allow it to easily cooperate with every other possible thing in ARI without requiring a developer to use a Snoop channel and manage it. It also doesn't put the "well if they start speaking what do I do" logic inside of Asterisk - it gives that power to the developer. > Yes, that sounds great. Async FTW. One observation to share: We often use something like SynthAndRecog (unimrcp-asterisk’s dialplan implementation) to handle both input and output in a single command. This allows prompts to be “barged”, or interrupted by speech or DTMF. What happens is that the speech recognizer is running while the synthesizer is playing back. When the caller speaks, it raises an MRCP event, which UniMRCP uses as a trigger to stop TTS playback. This works well enough, though occasionally the delay between start-of-speech and TTS getting hugged can be noticeable. What you’re proposing would mean letting the application stop TTS playback in response to a start-of-speech event. In our experience applications can get loaded down and delay those responses even more. Even in a best-case scenario, the latency for the application handling this kind of request would be significantly more than doing it inside of Asterisk. Since this is a very timing-critical operation (milliseconds count, as a human will pick up on the delay), it might be good to have an option that combines input with output for the purpose of barge. To borrow an example from a similar protocol: Rayo handles this by allowing all three kinds of commands: Input (for ASR or DTMF), Output (for TTS or audio file), and Prompt (for a combined Input + Output, where the Output is linked to stop on a start-of-input event). All 3 actions are async, raising the appropriate events as things happen. As you mentioned previously, there’s definitely a use-case for starting a recognizer and moving on with your business, firing events at the app and letting the app decide what to do with it. /BAK/ > Thoughts? > > -- > Joshua Colp > Digium, Inc. | Senior Software Developer > 445 Jan Davis Drive NW - Huntsville, AL 35806 - US > Check us out at: www.digium.com & www.asterisk.org > > _______________________________________________ > asterisk-app-dev mailing list > asterisk-app-dev@xxxxxxxxxxxxxxxx > http://lists.digium.com/cgi-bin/mailman/listinfo/asterisk-app-dev
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ asterisk-app-dev mailing list asterisk-app-dev@xxxxxxxxxxxxxxxx http://lists.digium.com/cgi-bin/mailman/listinfo/asterisk-app-dev