Re: WebSocket Stasis Control Best Practice

Krandon <krandon.bruse@xxxxxxxxx> · Wed, 18 Jun 2014 14:38:43 -0500

                On Wednesday, June 18, 2014 at 9:01 AM, Matthew Jordan wrote:

On Mon, Jun 16, 2014 at 6:03 PM, Krandon <krandon.bruse@xxxxxxxxx> wrote:

                    On Monday, June 16, 2014 at 9:12 AM, Matthew Jordan wrote:
Hey Matt/Everyone,

I actually really like the ideology behind this. The idea that ARI is truly a platform for which people can build complex telephony apps without having to write them in C is really neat. Also, since it's WebSockets, inherently very horizontally scalable which was somewhat of a challenge previously. I do understand some barrier to entry for people who are transitioning from AMI/AGI or even dial plan if they are "porting" an app because of functionality/apps that have to be essentially rewritten/rethought. As with any project, however, I strongly believe that more and more apps will become the "core" of several libraries for interfacing with ARI. It gives a lot of power to the individual utilizing the ARI interface.

Thanks!

With the TALK_DETECT function, I could re-create AMD and it would actually would be way more extendable (not to mention decently fast). 

I have at least tested the scalability of concurrent calls and requests generating lots of events and have been extremely impressed so far. This may be the scalability freak in me, but is there a way I can tell the ARI not to send me events that I don't care about when first connecting to /events? For example, RTP related events. I remember seeing something about not having pub/sub for a reason. I could just discard processing the packet, but with the load we've been dealing with, even that is very expensive.

Some background, and then a few thoughts on this topic.

When we started Asterisk 12, we had two goals with the APIs:

(1) Fix the presentation of the lifetime of a channel to external entities
(2) Provide something that lets people build their own Queue, VoiceMail, ConfBridge, and other communications applications
Understood - and that is very much needed. I view AMI as more of an abstraction to control Asterisk Applications and ARI as a non-C way to build Asterisk Applications (not to mention way more scalable. Writing a manager proxy with zeroMQ was no fun compared to the plethora of tools to help scale WebSocket-based applications). It really makes scaling out horizontally much easier. 

In order to do both, we needed to find a way to unify a variety of information sources in Asterisk. In prior versions of Asterisk, a lot of information that was generated was tightly coupled with the producer of that information. For example, AMI events were strings that were formed directly by the producers of the events, information about who was bridged with whom was directly produced by MeetMe, ConfBridge, and code in features.c, etc. While tightly coupling the producers/consumers is "fast" (no layers of abstraction to get in the way), it had a lot of drawbacks:

(1) Complexity in large portions of the code base and duplicated (and inconsistent) logic. Think CDRs and you'll know what i mean.
(2) No way to 'share' information. This was the big one: the event subsystem and AMI had a large amount of the information we would need for any API. Somehow we we had to unify that.

Thus was born an internal message bus - Stasis (which is where the dialplan application got its name from) - which is a pub/sub message bus internal to Asterisk that sucked up the event subsystem, AMI events, and a whole host of other stuff.

In Asterisk 12, everything is built on top of that message bus - CEL, CDRs, AMI, ARI, certain interesting features of attended transfers/parking, several resource modules (res_statsd), and other things.
We are using res_statsd now with graphite - a serious improvement over the previous SNMP-based monitoring. Again, falling into the "this has so many tools to help us grow horizontally" category. 

As with anything, there are no free lunches: the message bus allowed us to unify state and create a consistent picture of channels, bridges, and other objects. It decoupled the logic in Asterisk, making the source *much* easier to maintain, debug, fix, and extend. On the other hand, producing events can be (but is not always) a bit more expensive than it was in previous versions. A recent patch (https://reviewboard.asterisk.org/r/3568/) went through and cleaned up some of the more inefficient production of messages, but there's more room for improvement possible.

One area that would be interesting to explore is what you suggested - eliminating messages entirely. Currently, if you filter events out through AMI, this is done after the Stasis message is produced, published, received, and converted into an AMI event. The savings you get from filtering these events out is relatively minor on the Asterisk side - on the consumer side, it is still nice obviously (as you don't have to parse and toss out messages you don't care about). On the other hand, if you don't care about RTCP information (for example), why produce the Stasis message at all? Allowing the elimination of certain classes of messages has the potential to cut down traffic substantially in the message bus.

There would, of course, be tradeoffs: for example, a branch currently in development consumes those RTCP messages and transmits the data to a Homer SIP Capture server for live call quality monitoring. Killing the producer means everyone doesn't get the message. We'd want to be very careful about what messages we allow folks to eliminate - some messages that seem 'chatty' and are possible candidates for elimination could have strange ripple effects in various core parts of the code (CDRs in particular).
Fantastic - I guess my thinking behind not receiving the message on the client side, is that the client (in our case) is more complex than our Asterisk implementation (which is the first time this is happened for us, which is a HUGE A+ to the Asterisk dev team), so it's easy for us to just throw asterisk boxes at the problem of scale. In our testing anyways, we've seen hitting a wall in the media playback and setting up/tearing down of channels. I am not sure how to quantify the difference between if stasis events were or weren't being generated for something like RTCP, but my gut tells me that it's not a huge expense savings, though I could be wrong if there is something that is super chatty.

Two other points:
(1) ARI does not produce the RTCP events. The events you get over ARI are more limited in scope. Since we didn't have to worry about backwards compatibility with the first version of the interface, we took the opportunity to not put in every event someone has created for AMI. I'm sure over time this will change - but right now, at any rate, the events should be much more meaningful for everyone.

(2) Today, unlike AMI, you only get events for things you are subscribed to - not the entire system. This has the benefit of reducing the processing on the consumer side, at the cost of having to understand and manage the subscription model. I think there's benefits to having a 'global pipeline' as well as the 'show me only the things in my app' - it would be a good discussion to have at AstriDevCon this year.

Even setting channel variables in Stasis, then /continue to execute a dialplan app, then stasis again, reading the resulting channel vars. I actually don't think it's that much of a "hack" and could make for a really elegant app/app infrastructure, though not quite as sexy, I mean, extendable. 

That does kind of stink. There's a few other situations I've seen as well where a channel gets tossed into a Stasis app just so the external app knows to make a subscription to it.

Giving access to change some aspect of a channel outside of an application is a one way ticket - you can do it and be backwards compatible, but once you've granted access to a channel, you can't remove it. There are some functions that modify the state of a channel in 'special' ways, such as the SHELL function, that make me a little concerned about removing that restriction without some careful thought.
The above solution of passing channel variables makes it very straightforward. We've even used AMI to rewrite dial plan to get it to work before, but now I can just pass channel variables with the right variables directly to the apps. Pretty straightforward!

Gone are the days of using manager to throw two people into a queue, crossing your fingers and hoping some weird masquerade a) makes it to a CDR which can be tracked and b) doesn't blow up - big props to the Asterisk dev team on that! (unrelated to ARI, but still very important for the growth and implementation of ARI - imho)

I'll report back results of an AMD implementation using TALK_DETECT - checking out the latest Asterisk 12 now.

Thanks! Looking forward to what you find out -

Matt

Thanks again for the support Matt. This will really speed up our development time. 

-- 
Matthew Jordan
Digium, Inc. | Engineering Manager
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org

_______________________________________________
asterisk-app-dev mailing list
asterisk-app-dev@xxxxxxxxxxxxxxxx
http://lists.digium.com/cgi-bin/mailman/listinfo/asterisk-app-dev

            _______________________________________________
asterisk-app-dev mailing list
asterisk-app-dev@xxxxxxxxxxxxxxxx
http://lists.digium.com/cgi-bin/mailman/listinfo/asterisk-app-dev