PJSIP for high scale SIP server

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Gang,

Thanks for your response.

Yes, you're right that only bono instances need to handle lots of TCP connections.  It's not clear whether the TCP connections are the limiting factor, though, or whether it's the message rate.  sprout nodes don't need to handle as many messages as bono as

*         sprout nodes are only in the signaling path once, while bono nodes are in the signaling path twice (once on the calling party side, and once on the called party side)

*         sprout drops out of the signaling path once the dialog is established, while bono stays in

*         sprout nodes have a lot more non-transport work to do (e.g. querying the HSS, doing ENUM lookups), so the transport thread load is a smaller proportion of the total load.

Yes, the transport thread we run is the pjsip_thread you found - good spot!

With 50k TCP connections, I think we're looking at ~1.2k messages per second, but this is based on some rough calculations rather than metrics (which I'd like to add).  I also haven't delved too far into where on this thread the bottleneck was - I've been approaching it from the perspective of whether we could run multiple transport threads - I appreciate that depending on where the bottleneck is, adding more threads might not solve the problem, though.

I'll do some more investigation - thanks for your input!

Cheers,

Matt

From: pjsip [mailto:pjsip-bounces@xxxxxxxxxxxxxxx] On Behalf Of Gang Liu
Sent: 07 August 2013 05:55
To: pjsip list
Subject: Re: PJSIP for high scale SIP server

Matt,
      Based on my understanding from sprout source code, only bono instances need to handle many TCP connections because there are TCP connection pools between bono and sprout.

      I saw there are some worker threads managed by STACK module which processing rx messages from cloned message queue. And pjsip thread is calling pjsip_endpt_handle_events(polling timer head and ioqueue).
      Did you mean transport thread is a pjsip thread defined by
                   static int pjsip_thread(void *p)                   stack.cpp

      If yes, this transport thread/pjsip thread is polling IOQUEUE and timerheap. Because STACK module clones rx msgs to queue which processed by worker thread later, so this transport thread actually is only working on network I/O event(epoll) and sip message parsing(transport manager layer) and timerheap.

      I am wondering how many messages per second or transcations per second bono(edge proxy) need to handling when 50k concurrent TCP conns there?  Which is the  bottleneck, network event, sip parser or timerheap ?

     Any guideline will be helpful use bono as a edge proxy before kamailio/opensips. It will be more easy to do multiple transport threads stress testing.

regards,
Gang
On Fri, Aug 2, 2013 at 6:21 PM, Matt Williams <Matt.Williams at metaswitch.com<mailto:Matt.Williams at metaswitch.com>> wrote:
Gang,

Yes, we're definitely looking at high-scale here - we currently run with 50k TCP connections on one EC2 m1.small (single core).  We're looking to scale up to 25M TCP connections total.

Because our architecture is stateless, we smoothly scale horizontally but having 500 nodes to manage is a bit of a headache, so the option to run on fewer larger (multi-core) machines would be nice.  Unfortunately, we can't take advantage of multi-core machines because the transport thread itself uses a significant proportion of the total CPU (the process is a simple edge proxy, so the worker thread is fairly lightly-loaded).

Cheers,

Matt

From: pjsip [mailto:pjsip-bounces@xxxxxxxxxxxxxxx<mailto:pjsip-bounces at lists.pjsip.org>] On Behalf Of Gang Liu
Sent: 02 August 2013 04:17

To: pjsip list
Subject: Re: PJSIP for high scale SIP server

for small, middle scale projects single transport thead is enough.

maybe it will be benefit if use multiple transport threads to handle 50000 TLS connections per pjsip endpoint.

regards,
Gang
On Thu, Aug 1, 2013 at 9:13 PM, Matt Williams <Matt.Williams at metaswitch.com<mailto:Matt.Williams at metaswitch.com>> wrote:
Dennis,

Thanks for your email.

Yes, I'd noticed that Asterisk was switching to PJSIP.  Unfortunately, it only uses a single transport thread too - it seems that's the approach everyone uses.

Thanks again,

Matt

From: pjsip [mailto:pjsip-bounces@xxxxxxxxxxxxxxx<mailto:pjsip-bounces at lists.pjsip.org>] On Behalf Of Dennis Guse
Sent: 01 August 2013 13:30

To: pjsip list
Subject: Re: PJSIP for high scale SIP server

Asterisk is switching towards PJSIP with the next version 12 (tbd October).
Probably there is some experience with this kind of problem.

https://wiki.asterisk.org/wiki/display/AST/New+SIP+channel+driver
http://lists.digium.com/pipermail/asterisk-dev/2012-December/057997.html

---
Dennis Guse

On Thu, Aug 1, 2013 at 10:04 AM, Matt Williams <Matt.Williams at metaswitch.com<mailto:Matt.Williams at metaswitch.com>> wrote:
Gang,

Thanks for your response.

Your project sounded interesting - it's a shame it didn't continue.  It's good to hear (in some ways) that we're not the only ones to hit this issue, and that you resolved them in the same way as we have.

I'll keep digging on the multi-threading issue - it would be good to be able to run multiple transport threads.

Thanks,

Matt

From: pjsip [mailto:pjsip-bounces@xxxxxxxxxxxxxxx<mailto:pjsip-bounces at lists.pjsip.org>] On Behalf Of Gang Liu
Sent: 31 July 2013 04:01
To: pjsip list
Subject: Re: PJSIP for high scale SIP server

Four years ago, I has a class 4 routing demo project which require to handle 1000 CPS. I spent about one month to play with pjsip 0.9.0 to implement a B2BUA which could handle more than 2000 Call Leg Per Second, UDP transport. The beginning design was also use multiple pjsip worker threads. It worked very well at lad. But it had some race condition/dead lock when try to handle real traffic. I remember one deadlock case was INVITE retransmission timer timeout hanling at one thread and at the same time the other thread got 100 Trying packet from network. my solution was offload all CPU/IO bound processing logic to other threads and use only one thead to call pjsip_endpt_handle_events() and all other pjsip funcs. I would like to spend more time to trace but that project ended soon because of business reason.

regards,
Gang
On Sat, Jul 6, 2013 at 12:31 AM, Matt Williams <Matt.Williams at metaswitch.com<mailto:Matt.Williams at metaswitch.com>> wrote:
Hi,

I'm working on Project Clearwater (http://www.projectclearwater.org/), an open source highly-scalable IMS (IP Multimedia Subsystem) implementation.

We're using PJSIP as our SIP stack.  Most of the trails I've seen on the mailing list have been about using PJSIP for SIP clients, but is anyone using it (like us) server-side, e.g. for proxies or B2BUAs?  Each instance of our "bono" edge proxy server supports 50000 incoming SIP/TCP connections (and the limitation we then hit is with Amazon AWS EC2, not the software itself), but we're unable to have more than one transport thread (i.e. running pjsip_endpt_handle_events).  If we have more than one, we see crashes that seem to be related to concurrent accesses to shared data structures from multiple threads.

Does anyone have any experience of running multiple transport threads, or any pointers for using PJSIP at high scale?  I'm happy to investigate more (and share crash dumps if that's useful), but wanted to check whether anyone else had seen this first.

Thanks,

Matt


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip at lists.pjsip.org<mailto:pjsip at lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip at lists.pjsip.org<mailto:pjsip at lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip at lists.pjsip.org<mailto:pjsip at lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org


_______________________________________________
Visit our blog: http://blog.pjsip.org

pjsip mailing list
pjsip at lists.pjsip.org<mailto:pjsip at lists.pjsip.org>
http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/attachments/20130809/8b5ab46e/attachment-0001.html>


[Index of Archives]     [Asterisk Users]     [Asterisk App Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [Linux API]
  Powered by Linux