all advices are welcome, sorry Gang regards Mark ________________________________ Von: Gang Liu <gangban.lau at gmail.com> An: pjsip list <pjsip at lists.pjsip.org> Gesendet: Mittwoch, den 20. Mai 2009, 09:47:29 Uhr Betreff: Re: [pjsip] debug tips for cpu usage going wrong, deadlock issues, within pjsip or not. win32 specific thread profiling function used. Everyone has his own prefer method to debug, trace problem. I say nothing at it. But how many guys have read this document before using pjsua API? I don't think so. I will close my mouth if you don't like some advice. I tried Dual Quard Core Xeon server and 1000 cps, 20000 channels. Is it enough? I know may be it is difference between pjsip-ua api and pjsua api.But The key is our own program logic design must follow this guide.And spend enough time to understand why we need follow this guide. Then the world will be more easy. regards, Gang On Wed, May 20, 2009 at 3:26 PM, M.S. <hamstiede at yahoo.de> wrote: Hello Gang, to read the wiki guideline is the first step to resolve the problems, but if you have a multi threaddding/hyper thredding/multi-processor system and many connection for one agent, the race conditions will increase. i think i read the whole pjsip mail archive, and every month i heard about deadlock situations, infinity loops e.g. In my own applications i often use __FILE__ and __LINE__ debug outputs for mutex systems. it works fine !!! regards mark ________________________________ Von: Gang Liu <gangban.lau at gmail.com> An: pjsip list <pjsip at lists.pjsip.org> Gesendet: Mittwoch, den 20. Mai 2009, 05:30:51 Uhr Betreff: Re: [pjsip] debug tips for cpu usage going wrong, deadlock issues, within pjsip or not. win32 specific thread profiling function used. http://trac.pjsip.org/repos/wiki/PJSUA_Locks Is this guide line useful? Indeed, it is still possible to create some SIP message flow to cause PJSUA or PJSIP-UA deadlock.But this is another topic. regards, Gang On Mon, May 18, 2009 at 4:11 PM, M.S. <hamstiede at yahoo.de> wrote: thank you for your detailed debug summary. I am using linux but __FILE__ and __LINE__ is working too. For the thread tagging, i will use the linux pids (process-id). I thought i have a deadlock because: - if i have 1 or 2 connections(calls) i have < 5 % processor use (small arm system). -if i have 4 connections if get (only for one thread) 80-100% processor use. (after a view seconds). (for each media stream i use a separate conference bridge. all this threads will used only (<2% cpu)) - after this i get some deadlock informations form the stack like: possible deadlock ........ Questions: it is possible that this could happened if any stack state machine checks his own mutex like "bool IsPjsuaLocked()" ? I think i will build this stuff your wrote in the pjlib (thread tagging and detaild mutex logging with __FILE__ and __LINE__) thank you Mark ________________________________ Von: David Clark <vdc1048 at tx.rr.com> An: pjsip list <pjsip at lists.pjsip.org>; pjsip list <pjsip at lists.pjsip.org> Gesendet: Samstag, den 16. Mai 2009, 09:08:42 Uhr Betreff: [pjsip] debug tips for cpu usage going wrong, deadlock issues, within pjsip or not. win32 specific thread profiling function used. No 100% might well be a bug but most likely not deadlock. deadlock is where you are locked by the OS waiting for a mutex you will never get. This locked state uses 0% cpu. The usual symptom I see is after x number of calls. pjsip just does not communicate with anyone. Even SJPhone the sip phone application is ignored. But how to resolve your cpu goes to 100% issue. Yea not putting sleep(1) in some of your loops might be the cause of it. But if you have a multithread program in windows that is using 100% of the cpu and you don't know why. I got some tricks to tell you where. Tell you which thread is using that cpu and rule out others. The solution is to create a linked list of every thread in the box yorus and pjsip threads. You can do this with a wrapper function around CreateThread() for your stuff. You can do this by having pjsip's win32 create thread function call a function which simply adds the thread pjsip created to your linked list. Ok got your linked list. Next step. At regular intervals like say once a minute call a function which for each thread in your linked list calls this function: /************************************************************************/ /* */ /* Given a thread handle this function will determine the percent of */ /* time the thread has spent in user mode and kernal mode. */ /* */ /************************************************************************/ int thread_percent(HANDLE thread, int *user_mode, int *kernel_mode) { __int64 user_percent; __int64 kernel_percent; LARGE_INTEGER process_utime; LARGE_INTEGER process_ktime; LARGE_INTEGER thread_utime; LARGE_INTEGER thread_ktime; FILETIME process_creation_time; FILETIME process_exit_time; FILETIME process_user_time; FILETIME process_kernel_time; FILETIME thread_creation_time; FILETIME thread_exit_time; FILETIME thread_user_time; FILETIME thread_kernel_time; BOOL retval1, retval2; int retval=0; // assume failure. retval1=GetProcessTimes(GetCurrentProcess(), &process_creation_time, &process_exit_time, &process_kernel_time, &process_user_time); retval2=GetThreadTimes(thread, &thread_creation_time, &thread_exit_time, &thread_kernel_time, &thread_user_time); if ((retval1) && (retval2)) // if both functions worked. { memcpy(&process_utime, &process_user_time, sizeof(FILETIME)); memcpy(&process_ktime, &process_kernel_time, sizeof(FILETIME)); memcpy(&thread_utime, &thread_user_time, sizeof(FILETIME)); memcpy(&thread_ktime, &thread_kernel_time, sizeof(FILETIME)); if (process_utime.QuadPart==0) user_percent=0; else user_percent=thread_utime.QuadPart*100/process_utime.QuadPart; if (process_ktime.QuadPart==0) kernel_percent=0; else kernel_percent=thread_ktime.QuadPart*100/process_ktime.QuadPart; *user_mode=(int)user_percent; *kernel_mode=(int)kernel_percent; retval=1; } return(retval); } You will find most threads will be 0 and 0 and one will be significantly higher say 30 and 40. The one that is significantly higher is the offender. Hope this helps and makes sense. On the deadlock debugging I did do. I did this. I moved pjsip debug level at compile time and runtime to 6. But that was still not enough to tell me what I needed to know. So I augumented the mutex lock/unlock/try/destroy functions in pjilib with a version that gave the callers __FILE__, and __LINE__. __FILE__ gives the source module of the caller, and __LINE__ gives the line number the function was called from. I did something like this: #define mutex_lock(lock) _mutex_lock(lock, __FILE__, __LINE__) _mutex_lock(multex *lock, char *file, int line) { // in here I added file, and line to the level 6 debug output. } Run that and you know where you blocked and where you locked prior to that which gives the complete story. David Clark ps. Note this does produce a ton of debug logs. We are talking gigabytes. In my applications I have been able to give mutex locking information as part of the thread usage reports. I list what thread is blocked on which mutex where the block happened and where ownership was last successfully obtained. The complete story on one line. I have not put this into pjsip yet. It involved replacing all mutex functions, and since my application never uses a try mutex function, I don't have that one. And pjsip uses that heavily. So I would need to add that function. At 02:54 AM 5/15/2009, M.S. wrote: i think you have great skills to debug deadlock situations in pjsip. can you give me a good advice to debug pjsip/pjsua mutex stuff (log-level, debug outputs etc.) i have a situation that pjsip got 100% processor use. i think i have a deadlock. regards mark Von: David Clark <vdc1048 at tx.rr.com> An: pjsip list <pjsip at lists.pjsip.org>; pjsip list <pjsip at lists.pjsip.org> Gesendet: Donnerstag, den 14. Mai 2009, 22:22:56 Uhr Betreff: [pjsip] pjsip version 1.1 gotcha #2 with work around. This is similar to gotcha #1. So I will go into less detail. If the on_call_state() handler which gets hangup notifications calls pjsua_recorder_destroy(). The box can stop communicating after that for the following reason. 1) pjsua_recorder_destroy() locks pjsua mutex. 2) pjsua_recorder_destroy() locals conf mutex indirectly by calling pjsua_conf_remove_port(). But if conf mutex is locked in port audio clock thread get_frame() at the point of the call. You will block waiting for conf mutex and other threads will never get pjsua mutex. Work around: don't call pjsua_recorder_destroy() in on_call_state() signal the application thread to wait up and do it in the application thread. David Clark _______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip at lists.pjsip.org http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org _______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip at lists.pjsip.org http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org _______________________________________________ Visit our blog: http://blog.pjsip.org pjsip mailing list pjsip at lists.pjsip.org http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/attachments/20090520/9618e027/attachment-0001.html>