debug tips for cpu usage going wrong, deadlock issues, within pjsip or not. win32 specific thread profiling function used.

gangban.lau@xxxxxxxxx (Gang Liu) · Wed, 20 May 2009 15:47:29 +0800

Everyone has his own prefer method to debug, trace problem. I say nothing at
it.

But how many guys have read this document before using pjsua API? I don't
think so.

I will close my mouth if you don't like some advice.

I tried Dual Quard Core Xeon server and 1000 cps, 20000 channels. Is it
enough?

I know may be it is difference between pjsip-ua api and pjsua api.But The
key is our own program logic design must follow this guide.And spend enough
time to understand why we need follow this guide. Then the world will be
more easy.

regards,
Gang
On Wed, May 20, 2009 at 3:26 PM, M.S. <hamstiede at yahoo.de> wrote:

>  Hello Gang,
>
> to read the wiki guideline is the first step to resolve the problems, but
> if you have a multi threaddding/hyper thredding/multi-processor system and
> many  connection for one agent, the race conditions will increase.
> i think i read the whole pjsip mail archive, and every month i heard about
> deadlock situations, infinity loops e.g.
> In my own applications i often use __FILE__ and __LINE__ debug outputs for
> mutex systems. it works fine !!!
>
>
>
> regards
>    mark
>
>
>  ------------------------------
> *Von:* Gang Liu <gangban.lau at gmail.com>
> *An:* pjsip list <pjsip at lists.pjsip.org>
> *Gesendet:* Mittwoch, den 20. Mai 2009, 05:30:51 Uhr
> *Betreff:* Re: [pjsip] debug tips for cpu usage going wrong, deadlock
> issues, within pjsip or not. win32 specific thread profiling function used.
>
> http://trac.pjsip.org/repos/wiki/PJSUA_Locks
>
> Is this guide line useful?
> Indeed, it is still possible to create some SIP message flow to cause PJSUA
> or PJSIP-UA deadlock.But this is another topic.
>
> regards,
> Gang
>
> On Mon, May 18, 2009 at 4:11 PM, M.S. <hamstiede at yahoo.de> wrote:
>
>>  thank you for your detailed debug summary. I am using linux but __FILE__
>> and __LINE__ is working too.
>> For the thread tagging, i will use the linux pids (process-id).
>>
>> I thought i have a deadlock because:
>> - if i have 1 or 2 connections(calls) i have < 5 % processor use (small
>> arm system).
>> -if i have 4 connections if get (only for one thread) 80-100% processor
>> use. (after a view seconds).
>> (for each media stream i use a separate conference bridge. all this
>> threads will used only (<2% cpu))
>> - after this i get some deadlock informations form the stack like:
>>
>> possible deadlock ........
>>
>> Questions:
>> it is possible that this could happened  if  any stack state machine
>> checks his own  mutex like "bool IsPjsuaLocked()" ?
>>
>> I think i will build this stuff your wrote in the pjlib (thread tagging
>> and detaild mutex logging with __FILE__ and __LINE__)
>>
>>
>> thank you
>>
>>     Mark
>>
>>
>>  ------------------------------
>> *Von:* David Clark <vdc1048 at tx.rr.com>
>> *An:* pjsip list <pjsip at lists.pjsip.org>; pjsip list <
>> pjsip at lists.pjsip.org>
>> *Gesendet:* Samstag, den 16. Mai 2009, 09:08:42 Uhr
>> *Betreff:* [pjsip] debug tips for cpu usage going wrong, deadlock issues,
>> within pjsip or not. win32 specific thread profiling function used.
>>
>> No 100% might well be a bug but most likely not deadlock.  deadlock is
>> where you are locked by the OS waiting for a mutex you will never get.
>> This locked state uses 0% cpu.  The usual symptom I see is after x number
>> of calls.   pjsip just does not communicate with anyone.  Even SJPhone
>> the sip phone application is ignored.
>>
>> But how to resolve your cpu goes to 100% issue.  Yea not putting sleep(1)
>> in some of your loops might be the cause of it.  But if you have a
>> multithread
>> program in windows that is using 100% of the cpu and you don't know why.
>> I got some tricks to tell you where.  Tell you which thread is using
>> that cpu and rule out others.
>>
>> The solution is to create a linked list of every thread in the box yorus
>> and pjsip threads.  You can do this with a wrapper function around
>> CreateThread()
>> for your stuff.  You can do this by having pjsip's win32 create thread
>> function call a function which simply adds the thread pjsip created to your
>> linked list.
>>
>> Ok got your linked list.  Next step.
>> At regular intervals like say once a minute call a function which for each
>> thread in your linked list calls this function:
>> /************************************************************************/
>> /*                                                                      */
>> /* Given a thread handle this function will determine the percent of    */
>> /* time the thread has spent in user mode and kernal mode.              */
>> /*                                                                      */
>> /************************************************************************/
>> int thread_percent(HANDLE thread, int *user_mode, int *kernel_mode)
>> {
>>     __int64 user_percent;
>>     __int64 kernel_percent;
>>
>>     LARGE_INTEGER process_utime;
>>     LARGE_INTEGER process_ktime;
>>     LARGE_INTEGER thread_utime;
>>     LARGE_INTEGER thread_ktime;
>>
>>     FILETIME process_creation_time;
>>     FILETIME process_exit_time;
>>     FILETIME process_user_time;
>>     FILETIME process_kernel_time;
>>
>>     FILETIME thread_creation_time;
>>     FILETIME thread_exit_time;
>>     FILETIME thread_user_time;
>>     FILETIME thread_kernel_time;
>>
>>     BOOL retval1, retval2;
>>     int retval=0; // assume failure.
>>
>>
>>     retval1=GetProcessTimes(GetCurrentProcess(),
>>                     &process_creation_time,
>>                     &process_exit_time,
>>                     &process_kernel_time,
>>                     &process_user_time);
>>
>>     retval2=GetThreadTimes(thread,
>>                    &thread_creation_time,
>>                    &thread_exit_time,
>>                    &thread_kernel_time,
>>                    &thread_user_time);
>>
>>     if ((retval1) && (retval2)) // if both functions worked.
>>     {
>>         memcpy(&process_utime, &process_user_time, sizeof(FILETIME));
>>         memcpy(&process_ktime, &process_kernel_time, sizeof(FILETIME));
>>         memcpy(&thread_utime, &thread_user_time, sizeof(FILETIME));
>>         memcpy(&thread_ktime, &thread_kernel_time, sizeof(FILETIME));
>>
>>                   if (process_utime.QuadPart==0)
>>                            user_percent=0;
>>                   else
>>
>> user_percent=thread_utime.QuadPart*100/process_utime.QuadPart;
>>         if (process_ktime.QuadPart==0)
>>                  kernel_percent=0;
>>         else
>>
>> kernel_percent=thread_ktime.QuadPart*100/process_ktime.QuadPart;
>>
>>         *user_mode=(int)user_percent;
>>         *kernel_mode=(int)kernel_percent;
>>         retval=1;
>>     }
>>     return(retval);
>> }
>> You will find most threads will be 0 and 0 and one will be significantly
>> higher say 30 and 40.  The one that is significantly higher is the offender.
>>
>> Hope this helps and makes sense.
>>
>> On the deadlock debugging I did do.  I did this.  I moved pjsip debug
>> level at compile time and runtime to 6.
>> But that was still not enough to tell me what I needed to know.  So I
>> augumented the mutex lock/unlock/try/destroy functions in pjilib
>> with a version that gave the callers __FILE__, and __LINE__.  __FILE__
>> gives the source module of the caller, and __LINE__ gives the
>> line number the function was called from.
>>
>> I did something like this:
>> #define mutex_lock(lock) _mutex_lock(lock, __FILE__, __LINE__)
>>
>> _mutex_lock(multex *lock, char *file, int line)
>> {
>>    // in here I added file, and line to the level 6 debug output.
>> }
>>
>> Run that and you know where you blocked and where you locked prior to that
>> which gives the complete story.
>>
>> David Clark
>> ps. Note this does produce a ton of debug logs.  We are talking
>> gigabytes.   In my applications I have been able to
>> give mutex locking information as part of the thread usage reports.  I
>> list what thread is blocked on which mutex
>> where the block happened and where ownership was last successfully
>> obtained.  The complete story on one line.
>> I have not put this into pjsip yet.  It involved replacing all mutex
>> functions, and since my application never uses
>> a try mutex function, I don't have that one.  And pjsip uses that
>> heavily.  So I would need to add that function.
>>
>> At 02:54 AM 5/15/2009, M.S. wrote:
>>
>> i think you have great skills to debug  deadlock situations in pjsip.
>> can you give me a  good advice  to debug pjsip/pjsua mutex stuff
>> (log-level, debug outputs etc.)
>>
>> i have a situation that pjsip got 100% processor use. i think i have a
>> deadlock.
>>
>>   regards
>>     mark
>>
>>
>> *Von:* David Clark <vdc1048 at tx.rr.com>
>> *An:* pjsip list <pjsip at lists.pjsip.org>; pjsip list <
>> pjsip at lists.pjsip.org>
>> *Gesendet:* Donnerstag, den 14. Mai 2009, 22:22:56 Uhr
>> *Betreff:* [pjsip] pjsip version 1.1 gotcha #2 with work around.
>>
>> This is similar to gotcha #1.  So I will go into less detail.  If the
>> on_call_state() handler which gets hangup
>> notifications calls pjsua_recorder_destroy().  The box can stop
>> communicating after that for the following reason.
>> 1) pjsua_recorder_destroy() locks pjsua mutex.
>> 2) pjsua_recorder_destroy() locals conf mutex indirectly by calling
>> pjsua_conf_remove_port().
>>
>> But if conf mutex is locked in port audio clock thread get_frame() at the
>> point of the call.
>>
>> You will block waiting for conf mutex and other threads will never get
>> pjsua mutex.
>>
>> Work around: don't call pjsua_recorder_destroy() in on_call_state() signal
>> the application thread to wait up and do it
>> in the application thread.
>>
>> David Clark
>>
>> _______________________________________________
>> Visit our blog: http://blog.pjsip.org
>>
>> pjsip mailing list
>> pjsip at lists.pjsip.org
>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>
>>
>>
>> _______________________________________________
>> Visit our blog: http://blog.pjsip.org
>>
>> pjsip mailing list
>> pjsip at lists.pjsip.org
>> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>>
>>
>
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip at lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/attachments/20090520/db930968/attachment.html>