Is it possible to bind SIP messages belong to a dialog into the same SIP worker thread?

bennylp@xxxxxxxxx (Benny Prijono) · Fri, 22 May 2009 18:06:31 +0900

On Fri, May 22, 2009 at 4:16 PM, Gang Liu <gangban.lau at gmail.com> wrote:

> benny,
>      Is it possible to bind SIP messages belong to a dialog into the same
> SIP worker thread?
>      Pls consider below SIP msg flow:
>      UAC                        UAS
>                --> INVITE-->
>                <-- 100 <--
>                <-- 180 <--
>                <-- 487 <--
>                --> ACK -->
>                --> CANCEL ->
>
>     Yes, it seems UAC is wrong sending CANCEL after ACK. But it is possible
> UAC send CANCEL before 487 which delayed by network jitter.
>

487 is only sent after CANCEL, so that's unusual. But I'll just assume that
you mean this problem can happen with other final response.

>      In theory, there maybe problem when two sip worker thread handling ACK
> and CANCEL at the same time.For example, ACK thread will try to destroy tsx
> and its mutex when entering DESTROY state. But before this CANCEL thread got
> tsx mutex.
>      I know it will be rare and I couldn't use sipp to repeat this at
> testbed also.But it will be helpful if we can asign messages of one dialog
> to the same worker thread.
>
>

This issue has been identified, though there's no solution yet. You can
search for "PJ_TODO(FIX_RACE_CONDITION_HERE)" in sip_transaction.c for
places where some protection is needed to prevent this.

My feeling is setting thread affinity for a dialog is not a good solution,
as that would involve message passing between one thread and another, which
is inefficient, and I'm sure would introduce another complexity.

Your mail gives me an idea on how we might be able to solve this. If you see
the pseudocode for the part which has the race condition problem:

find_tsx()
{
   lock(hash_table_mutex);
   tsx = find_tsx_in_hash_table();
   unlock(hash_table_mutex);

   // potential race condition here

   lock(tsx);
   return tsx;
}

We should be able to fix the problem by adding another mutex like this:

find_tsx()
{
   lock(another_mutex);
   lock(hash_table_mutex);
   tsx = find_tsx_in_hash_table();
   unlock(hash_table_mutex);

   // potential race condition here

   lock(tsx);
   unlock(another_mutex);
   return tsx;
}

But so far nobody seems to complain about this anyway, so probably it's ok
after all. :)

cheers
 Benny

regards,
> Gang
>
>
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip at lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/attachments/20090522/4e345f7a/attachment-0001.html>