debug tips for cpu usage going wrong, deadlock issues, within pjsip or not. win32 specific thread profiling function used.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



http://trac.pjsip.org/repos/wiki/PJSUA_Locks

Is this guide line useful?
Indeed, it is still possible to create some SIP message flow to cause PJSUA
or PJSIP-UA deadlock.But this is another topic.

regards,
Gang

On Mon, May 18, 2009 at 4:11 PM, M.S. <hamstiede at yahoo.de> wrote:

>  thank you for your detailed debug summary. I am using linux but __FILE__
> and __LINE__ is working too.
> For the thread tagging, i will use the linux pids (process-id).
>
> I thought i have a deadlock because:
> - if i have 1 or 2 connections(calls) i have < 5 % processor use (small arm
> system).
> -if i have 4 connections if get (only for one thread) 80-100% processor
> use. (after a view seconds).
> (for each media stream i use a separate conference bridge. all this threads
> will used only (<2% cpu))
> - after this i get some deadlock informations form the stack like:
>
> possible deadlock ........
>
> Questions:
> it is possible that this could happened  if  any stack state machine checks
> his own  mutex like "bool IsPjsuaLocked()" ?
>
> I think i will build this stuff your wrote in the pjlib (thread tagging and
> detaild mutex logging with __FILE__ and __LINE__)
>
>
> thank you
>
>     Mark
>
>
>  ------------------------------
> *Von:* David Clark <vdc1048 at tx.rr.com>
> *An:* pjsip list <pjsip at lists.pjsip.org>; pjsip list <
> pjsip at lists.pjsip.org>
> *Gesendet:* Samstag, den 16. Mai 2009, 09:08:42 Uhr
> *Betreff:* [pjsip] debug tips for cpu usage going wrong, deadlock issues,
> within pjsip or not. win32 specific thread profiling function used.
>
> No 100% might well be a bug but most likely not deadlock.  deadlock is
> where you are locked by the OS waiting for a mutex you will never get.
> This locked state uses 0% cpu.  The usual symptom I see is after x number
> of calls.   pjsip just does not communicate with anyone.  Even SJPhone
> the sip phone application is ignored.
>
> But how to resolve your cpu goes to 100% issue.  Yea not putting sleep(1)
> in some of your loops might be the cause of it.  But if you have a
> multithread
> program in windows that is using 100% of the cpu and you don't know why.  I
> got some tricks to tell you where.  Tell you which thread is using
> that cpu and rule out others.
>
> The solution is to create a linked list of every thread in the box yorus
> and pjsip threads.  You can do this with a wrapper function around
> CreateThread()
> for your stuff.  You can do this by having pjsip's win32 create thread
> function call a function which simply adds the thread pjsip created to your
> linked list.
>
> Ok got your linked list.  Next step.
> At regular intervals like say once a minute call a function which for each
> thread in your linked list calls this function:
> /************************************************************************/
> /*                                                                      */
> /* Given a thread handle this function will determine the percent of    */
> /* time the thread has spent in user mode and kernal mode.              */
> /*                                                                      */
> /************************************************************************/
> int thread_percent(HANDLE thread, int *user_mode, int *kernel_mode)
> {
>     __int64 user_percent;
>     __int64 kernel_percent;
>
>     LARGE_INTEGER process_utime;
>     LARGE_INTEGER process_ktime;
>     LARGE_INTEGER thread_utime;
>     LARGE_INTEGER thread_ktime;
>
>     FILETIME process_creation_time;
>     FILETIME process_exit_time;
>     FILETIME process_user_time;
>     FILETIME process_kernel_time;
>
>     FILETIME thread_creation_time;
>     FILETIME thread_exit_time;
>     FILETIME thread_user_time;
>     FILETIME thread_kernel_time;
>
>     BOOL retval1, retval2;
>     int retval=0; // assume failure.
>
>
>     retval1=GetProcessTimes(GetCurrentProcess(),
>                     &process_creation_time,
>                     &process_exit_time,
>                     &process_kernel_time,
>                     &process_user_time);
>
>     retval2=GetThreadTimes(thread,
>                    &thread_creation_time,
>                    &thread_exit_time,
>                    &thread_kernel_time,
>                    &thread_user_time);
>
>     if ((retval1) && (retval2)) // if both functions worked.
>     {
>         memcpy(&process_utime, &process_user_time, sizeof(FILETIME));
>         memcpy(&process_ktime, &process_kernel_time, sizeof(FILETIME));
>         memcpy(&thread_utime, &thread_user_time, sizeof(FILETIME));
>         memcpy(&thread_ktime, &thread_kernel_time, sizeof(FILETIME));
>
>                   if (process_utime.QuadPart==0)
>                            user_percent=0;
>                   else
>
> user_percent=thread_utime.QuadPart*100/process_utime.QuadPart;
>         if (process_ktime.QuadPart==0)
>                  kernel_percent=0;
>         else
>
> kernel_percent=thread_ktime.QuadPart*100/process_ktime.QuadPart;
>
>         *user_mode=(int)user_percent;
>         *kernel_mode=(int)kernel_percent;
>         retval=1;
>     }
>     return(retval);
> }
> You will find most threads will be 0 and 0 and one will be significantly
> higher say 30 and 40.  The one that is significantly higher is the offender.
>
> Hope this helps and makes sense.
>
> On the deadlock debugging I did do.  I did this.  I moved pjsip debug level
> at compile time and runtime to 6.
> But that was still not enough to tell me what I needed to know.  So I
> augumented the mutex lock/unlock/try/destroy functions in pjilib
> with a version that gave the callers __FILE__, and __LINE__.  __FILE__
> gives the source module of the caller, and __LINE__ gives the
> line number the function was called from.
>
> I did something like this:
> #define mutex_lock(lock) _mutex_lock(lock, __FILE__, __LINE__)
>
> _mutex_lock(multex *lock, char *file, int line)
> {
>    // in here I added file, and line to the level 6 debug output.
> }
>
> Run that and you know where you blocked and where you locked prior to that
> which gives the complete story.
>
> David Clark
> ps. Note this does produce a ton of debug logs.  We are talking
> gigabytes.   In my applications I have been able to
> give mutex locking information as part of the thread usage reports.  I list
> what thread is blocked on which mutex
> where the block happened and where ownership was last successfully
> obtained.  The complete story on one line.
> I have not put this into pjsip yet.  It involved replacing all mutex
> functions, and since my application never uses
> a try mutex function, I don't have that one.  And pjsip uses that heavily.
> So I would need to add that function.
>
> At 02:54 AM 5/15/2009, M.S. wrote:
>
> i think you have great skills to debug  deadlock situations in pjsip.
> can you give me a  good advice  to debug pjsip/pjsua mutex stuff
> (log-level, debug outputs etc.)
>
> i have a situation that pjsip got 100% processor use. i think i have a
> deadlock.
>
>   regards
>     mark
>
>
> *Von:* David Clark <vdc1048 at tx.rr.com>
> *An:* pjsip list <pjsip at lists.pjsip.org>; pjsip list <
> pjsip at lists.pjsip.org>
> *Gesendet:* Donnerstag, den 14. Mai 2009, 22:22:56 Uhr
> *Betreff:* [pjsip] pjsip version 1.1 gotcha #2 with work around.
>
> This is similar to gotcha #1.  So I will go into less detail.  If the
> on_call_state() handler which gets hangup
> notifications calls pjsua_recorder_destroy().  The box can stop
> communicating after that for the following reason.
> 1) pjsua_recorder_destroy() locks pjsua mutex.
> 2) pjsua_recorder_destroy() locals conf mutex indirectly by calling
> pjsua_conf_remove_port().
>
> But if conf mutex is locked in port audio clock thread get_frame() at the
> point of the call.
>
> You will block waiting for conf mutex and other threads will never get
> pjsua mutex.
>
> Work around: don't call pjsua_recorder_destroy() in on_call_state() signal
> the application thread to wait up and do it
> in the application thread.
>
> David Clark
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip at lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
>
> _______________________________________________
> Visit our blog: http://blog.pjsip.org
>
> pjsip mailing list
> pjsip at lists.pjsip.org
> http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/attachments/20090520/0ec817fd/attachment-0001.html>


[Index of Archives]     [Asterisk Users]     [Asterisk App Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [Linux API]
  Powered by Linux