debug tips for cpu usage going wrong, deadlock issues, within pjsip or not. win32 specific thread profiling function used.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 02:47 AM 5/20/2009, Gang Liu wrote:
>Everyone has his own prefer method to debug, trace problem. I say 
>nothing at it.

Certainly.  I just offered mine to try and help this situation and 
kick ideas around.

>
>But how many guys have read this document before using pjsua API? I 
>don't think so.

Nope I certainly don't recall reading that document.

>
>I will close my mouth if you don't like some advice.
>
>I tried Dual Quard Core Xeon server and 1000 cps, 20000 channels. Is 
>it enough?

Ok lets talk.  20000 channels.  I don't get anywhere near that 
number.  Now part of the issue is my app is fully multithreaded.
And my testing on windows xp/xp x64 indicates windows will give you 
new thread allocations but simply fail to give them a
timeslice inytime soon.  And that test was totally outside pjsip just 
to see where threading limits where.    The limit is like
3100 threads.  You can go higher with a smaller default stack.  The 
default stack for windows is 1 meg so that becomes
a issue in and of itself.  But alas I have not been able to duplicate 
the result of going higher in thread count with a smaller
default stack size.  The limit I saw there was 13,000 threads.

But now for my question on your test.  You do 1000 cps which I assume 
is calls per second?
call duration must be short and not much happening after the call is 
established.

My basic hammer test is to do this:
Box A                                               Box B
makecall to Box B                             receive call
record call to Box B                           makecall to Box A as a 
separate call
Play a 3 minute message                   bridge incoming and outbound calls
                                                         record the 
conversation.

This is all done with heavy use of the pjsua media library.  And I am 
not able to go over 93 callers on either box.  Now
I know the pjsua media library is the limitation.  But take that away 
and I am lost.  My question to you is have you come
up with a way to do audio functionality without the pjsua media library?

>
>I know may be it is difference between pjsip-ua api and pjsua 
>api.But The key is our own program logic design must follow this 
>guide.And spend enough time to understand why we need follow this 
>guide. Then the world will be more easy.
>
>regards,
>Gang
>On Wed, May 20, 2009 at 3:26 PM, M.S. 
><<mailto:hamstiede at yahoo.de>hamstiede at yahoo.de> wrote:
>Hello Gang,
>
>to read the wiki guideline is the first step to resolve the 
>problems, but if you have a multi threaddding/hyper 
>thredding/multi-processor system and many  connection for one agent, 
>the race conditions will increase.
>i think i read the whole pjsip mail archive, and every month i heard 
>about deadlock situations, infinity loops e.g.
>In my own applications i often use __FILE__ and __LINE__ debug 
>outputs for mutex systems. it works fine !!!
>
>
>
>regards
>    mark
>
>
>
>Von: Gang Liu <<mailto:gangban.lau at gmail.com>gangban.lau at gmail.com>
>An: pjsip list <<mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org>
>Gesendet: Mittwoch, den 20. Mai 2009, 05:30:51 Uhr
>Betreff: Re: [pjsip] debug tips for cpu usage going wrong, deadlock 
>issues, within pjsip or not. win32 specific thread profiling function used.
>
><http://trac.pjsip.org/repos/wiki/PJSUA_Locks>http://trac.pjsip.org/repos/wiki/PJSUA_Locks
>
>Is this guide line useful?
>Indeed, it is still possible to create some SIP message flow to 
>cause PJSUA or PJSIP-UA deadlock.But this is another topic.
>
>regards,
>Gang
>
>On Mon, May 18, 2009 at 4:11 PM, M.S. 
><<mailto:hamstiede at yahoo.de>hamstiede at yahoo.de> wrote:
>thank you for your detailed debug summary. I am using linux but 
>__FILE__ and __LINE__ is working too.
>For the thread tagging, i will use the linux pids (process-id).
>
>I thought i have a deadlock because:
>- if i have 1 or 2 connections(calls) i have < 5 % processor use 
>(small arm system).
>-if i have 4 connections if get (only for one thread) 80-100% 
>processor use. (after a view seconds).
>(for each media stream i use a separate conference bridge. all this 
>threads will used only (<2% cpu))
>- after this i get some deadlock informations form the stack like:
>
>possible deadlock ........
>
>Questions:
>it is possible that this could happened  if  any stack state machine 
>checks his own  mutex like "bool IsPjsuaLocked()" ?
>
>I think i will build this stuff your wrote in the pjlib (thread 
>tagging and detaild mutex logging with __FILE__ and __LINE__)
>
>
>thank you
>
>     Mark
>
>
>
>Von: David Clark <<mailto:vdc1048 at tx.rr.com>vdc1048 at tx.rr.com>
>An: pjsip list 
><<mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org>; pjsip list 
><<mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org>
>Gesendet: Samstag, den 16. Mai 2009, 09:08:42 Uhr
>Betreff: [pjsip] debug tips for cpu usage going wrong, deadlock 
>issues, within pjsip or not. win32 specific thread profiling function used.
>
>No 100% might well be a bug but most likely not deadlock.  deadlock 
>is where you are locked by the OS waiting for a mutex you will never get.
>This locked state uses 0% cpu.  The usual symptom I see is after x 
>number of calls.   pjsip just does not communicate with anyone.  Even SJPhone
>the sip phone application is ignored.
>
>But how to resolve your cpu goes to 100% issue.  Yea not putting 
>sleep(1) in some of your loops might be the cause of it.  But if you 
>have a multithread
>program in windows that is using 100% of the cpu and you don't know 
>why.  I got some tricks to tell you where.  Tell you which thread is using
>that cpu and rule out others.
>
>The solution is to create a linked list of every thread in the box 
>yorus and pjsip threads.  You can do this with a wrapper function 
>around CreateThread()
>for your stuff.  You can do this by having pjsip's win32 create 
>thread function call a function which simply adds the thread pjsip 
>created to your linked list.
>
>Ok got your linked list.  Next step.
>At regular intervals like say once a minute call a function which 
>for each thread in your linked list calls this function:
>/************************************************************************/
>/*                                                                      */
>/* Given a thread handle this function will determine the percent of    */
>/* time the thread has spent in user mode and kernal mode.              */
>/*                                                                      */
>/************************************************************************/
>int thread_percent(HANDLE thread, int *user_mode, int *kernel_mode)
>{
>     __int64 user_percent;
>     __int64 kernel_percent;
>
>     LARGE_INTEGER process_utime;
>     LARGE_INTEGER process_ktime;
>     LARGE_INTEGER thread_utime;
>     LARGE_INTEGER thread_ktime;
>
>     FILETIME process_creation_time;
>     FILETIME process_exit_time;
>     FILETIME process_user_time;
>     FILETIME process_kernel_time;
>
>     FILETIME thread_creation_time;
>     FILETIME thread_exit_time;
>     FILETIME thread_user_time;
>     FILETIME thread_kernel_time;
>
>     BOOL retval1, retval2;
>     int retval=0; // assume failure.
>
>
>     retval1=GetProcessTimes(GetCurrentProcess(),
>                     &process_creation_time,
>                     &process_exit_time,
>                     &process_kernel_time,
>                     &process_user_time);
>
>     retval2=GetThreadTimes(thread,
>                    &thread_creation_time,
>                    &thread_exit_time,
>                    &thread_kernel_time,
>                    &thread_user_time);
>
>     if ((retval1) && (retval2)) // if both functions worked.
>     {
>         memcpy(&process_utime, &process_user_time, sizeof(FILETIME));
>         memcpy(&process_ktime, &process_kernel_time, sizeof(FILETIME));
>         memcpy(&thread_utime, &thread_user_time, sizeof(FILETIME));
>         memcpy(&thread_ktime, &thread_kernel_time, sizeof(FILETIME));
>
>                   if (process_utime.QuadPart==0)
>                            user_percent=0;
>                   else
> 
>user_percent=thread_utime.QuadPart*100/process_utime.QuadPart;
>         if (process_ktime.QuadPart==0)
>                  kernel_percent=0;
>         else
> 
>kernel_percent=thread_ktime.QuadPart*100/process_ktime.QuadPart;
>
>         *user_mode=(int)user_percent;
>         *kernel_mode=(int)kernel_percent;
>         retval=1;
>     }
>     return(retval);
>}
>You will find most threads will be 0 and 0 and one will be 
>significantly higher say 30 and 40.  The one that is significantly 
>higher is the offender.
>
>Hope this helps and makes sense.
>
>On the deadlock debugging I did do.  I did this.  I moved pjsip 
>debug level at compile time and runtime to 6.
>But that was still not enough to tell me what I needed to know.  So 
>I augumented the mutex lock/unlock/try/destroy functions in pjilib
>with a version that gave the callers __FILE__, and 
>__LINE__.  __FILE__ gives the source module of the caller, and 
>__LINE__ gives the
>line number the function was called from.
>
>I did something like this:
>#define mutex_lock(lock) _mutex_lock(lock, __FILE__, __LINE__)
>
>_mutex_lock(multex *lock, char *file, int line)
>{
>    // in here I added file, and line to the level 6 debug output.
>}
>
>Run that and you know where you blocked and where you locked prior 
>to that which gives the complete story.
>
>David Clark
>ps. Note this does produce a ton of debug logs.  We are talking 
>gigabytes.   In my applications I have been able to
>give mutex locking information as part of the thread usage 
>reports.  I list what thread is blocked on which mutex
>where the block happened and where ownership was last successfully 
>obtained.  The complete story on one line.
>I have not put this into pjsip yet.  It involved replacing all mutex 
>functions, and since my application never uses
>a try mutex function, I don't have that one.  And pjsip uses that 
>heavily.  So I would need to add that function.
>
>At 02:54 AM 5/15/2009, M.S. wrote:
>>i think you have great skills to debug  deadlock situations in pjsip.
>>can you give me a  good advice  to debug pjsip/pjsua mutex stuff 
>>(log-level, debug outputs etc.)
>>
>>i have a situation that pjsip got 100% processor use. i think i 
>>have a deadlock.
>>
>>   regards
>>     mark
>>
>>
>>Von: David Clark <<mailto:vdc1048 at tx.rr.com>vdc1048 at tx.rr.com>
>>An: pjsip list 
>><<mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org>; pjsip list 
>><<mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org>
>>Gesendet: Donnerstag, den 14. Mai 2009, 22:22:56 Uhr
>>Betreff: [pjsip] pjsip version 1.1 gotcha #2 with work around.
>>
>>This is similar to gotcha #1.  So I will go into less detail.  If 
>>the on_call_state() handler which gets hangup
>>notifications calls pjsua_recorder_destroy().  The box can stop 
>>communicating after that for the following reason.
>>1) pjsua_recorder_destroy() locks pjsua mutex.
>>2) pjsua_recorder_destroy() locals conf mutex indirectly by calling 
>>pjsua_conf_remove_port().
>>
>>But if conf mutex is locked in port audio clock thread get_frame() 
>>at the point of the call.
>>
>>You will block waiting for conf mutex and other threads will never 
>>get pjsua mutex.
>>
>>Work around: don't call pjsua_recorder_destroy() in on_call_state() 
>>signal the application thread to wait up and do it
>>in the application thread.
>>
>>David Clark
>>
>>_______________________________________________
>>Visit our blog: <http://blog.pjsip.org/>http://blog.pjsip.org
>>
>>pjsip mailing list
>><mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org
>>http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
>_______________________________________________
>Visit our blog: <http://blog.pjsip.org/>http://blog.pjsip.org
>
>pjsip mailing list
><mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org
>http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
>
>
>_______________________________________________
>Visit our blog: <http://blog.pjsip.org/>http://blog.pjsip.org
>
>pjsip mailing list
><mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org
>http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
>_______________________________________________
>Visit our blog: http://blog.pjsip.org
>
>pjsip mailing list
>pjsip at lists.pjsip.org
>http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/attachments/20090520/c28bd886/attachment-0001.html>


[Index of Archives]     [Asterisk Users]     [Asterisk App Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [Linux API]
  Powered by Linux