debug tips for cpu usage going wrong, deadlock issues, within pjsip or not. win32 specific thread profiling function used.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



No 100% might well be a bug but most likely not deadlock.  deadlock 
is where you are locked by the OS waiting for a mutex you will never get.
This locked state uses 0% cpu.  The usual symptom I see is after x 
number of calls.   pjsip just does not communicate with anyone.  Even SJPhone
the sip phone application is ignored.

But how to resolve your cpu goes to 100% issue.  Yea not putting 
sleep(1) in some of your loops might be the cause of it.  But if you 
have a multithread
program in windows that is using 100% of the cpu and you don't know 
why.  I got some tricks to tell you where.  Tell you which thread is using
that cpu and rule out others.

The solution is to create a linked list of every thread in the box 
yorus and pjsip threads.  You can do this with a wrapper function 
around CreateThread()
for your stuff.  You can do this by having pjsip's win32 create 
thread function call a function which simply adds the thread pjsip 
created to your linked list.

Ok got your linked list.  Next step.
At regular intervals like say once a minute call a function which for 
each thread in your linked list calls this function:
/************************************************************************/
/*                                                                      */
/* Given a thread handle this function will determine the percent of    */
/* time the thread has spent in user mode and kernal mode.              */
/*                                                                      */
/************************************************************************/
int thread_percent(HANDLE thread, int *user_mode, int *kernel_mode)
{
     __int64 user_percent;
     __int64 kernel_percent;

     LARGE_INTEGER process_utime;
     LARGE_INTEGER process_ktime;
     LARGE_INTEGER thread_utime;
     LARGE_INTEGER thread_ktime;

     FILETIME process_creation_time;
     FILETIME process_exit_time;
     FILETIME process_user_time;
     FILETIME process_kernel_time;

     FILETIME thread_creation_time;
     FILETIME thread_exit_time;
     FILETIME thread_user_time;
     FILETIME thread_kernel_time;

     BOOL retval1, retval2;
     int retval=0; // assume failure.


     retval1=GetProcessTimes(GetCurrentProcess(),
                     &process_creation_time,
                     &process_exit_time,
                     &process_kernel_time,
                     &process_user_time);

     retval2=GetThreadTimes(thread,
                    &thread_creation_time,
                    &thread_exit_time,
                    &thread_kernel_time,
                    &thread_user_time);

     if ((retval1) && (retval2)) // if both functions worked.
     {
         memcpy(&process_utime, &process_user_time, sizeof(FILETIME));
         memcpy(&process_ktime, &process_kernel_time, sizeof(FILETIME));
         memcpy(&thread_utime, &thread_user_time, sizeof(FILETIME));
         memcpy(&thread_ktime, &thread_kernel_time, sizeof(FILETIME));

                 if (process_utime.QuadPart==0)
                         user_percent=0;
                 else
                 user_percent=thread_utime.QuadPart*100/process_utime.QuadPart;
         if (process_ktime.QuadPart==0)
                 kernel_percent=0;
         else
                 kernel_percent=thread_ktime.QuadPart*100/process_ktime.QuadPart;

         *user_mode=(int)user_percent;
         *kernel_mode=(int)kernel_percent;
         retval=1;
     }
     return(retval);
}
You will find most threads will be 0 and 0 and one will be 
significantly higher say 30 and 40.  The one that is significantly 
higher is the offender.

Hope this helps and makes sense.

On the deadlock debugging I did do.  I did this.  I moved pjsip debug 
level at compile time and runtime to 6.
But that was still not enough to tell me what I needed to know.  So I 
augumented the mutex lock/unlock/try/destroy functions in pjilib
with a version that gave the callers __FILE__, and 
__LINE__.  __FILE__ gives the source module of the caller, and 
__LINE__ gives the
line number the function was called from.

I did something like this:
#define mutex_lock(lock) _mutex_lock(lock, __FILE__, __LINE__)

_mutex_lock(multex *lock, char *file, int line)
{
    // in here I added file, and line to the level 6 debug output.
}

Run that and you know where you blocked and where you locked prior to 
that which gives the complete story.

David Clark
ps. Note this does produce a ton of debug logs.  We are talking 
gigabytes.   In my applications I have been able to
give mutex locking information as part of the thread usage 
reports.  I list what thread is blocked on which mutex
where the block happened and where ownership was last successfully 
obtained.  The complete story on one line.
I have not put this into pjsip yet.  It involved replacing all mutex 
functions, and since my application never uses
a try mutex function, I don't have that one.  And pjsip uses that 
heavily.  So I would need to add that function.

At 02:54 AM 5/15/2009, M.S. wrote:
>i think you have great skills to debug  deadlock situations in pjsip.
>can you give me a  good advice  to debug pjsip/pjsua mutex stuff 
>(log-level, debug outputs etc.)
>
>i have a situation that pjsip got 100% processor use. i think i have 
>a deadlock.
>
>   regards
>     mark
>
>
>Von: David Clark <vdc1048 at tx.rr.com>
>An: pjsip list <pjsip at lists.pjsip.org>; pjsip list <pjsip at lists.pjsip.org>
>Gesendet: Donnerstag, den 14. Mai 2009, 22:22:56 Uhr
>Betreff: [pjsip] pjsip version 1.1 gotcha #2 with work around.
>
>This is similar to gotcha #1.  So I will go into less detail.  If 
>the on_call_state() handler which gets hangup
>notifications calls pjsua_recorder_destroy().  The box can stop 
>communicating after that for the following reason.
>1) pjsua_recorder_destroy() locks pjsua mutex.
>2) pjsua_recorder_destroy() locals conf mutex indirectly by calling 
>pjsua_conf_remove_port().
>
>But if conf mutex is locked in port audio clock thread get_frame() 
>at the point of the call.
>
>You will block waiting for conf mutex and other threads will never 
>get pjsua mutex.
>
>Work around: don't call pjsua_recorder_destroy() in on_call_state() 
>signal the application thread to wait up and do it
>in the application thread.
>
>David Clark
>
>_______________________________________________
>Visit our blog: http://blog.pjsip.org
>
>pjsip mailing list
>pjsip at lists.pjsip.org
>http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/attachments/20090516/6111d4df/attachment-0001.html>


[Index of Archives]     [Asterisk Users]     [Asterisk App Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [Linux API]
  Powered by Linux