At 02:47 AM 5/20/2009, Gang Liu wrote: >Everyone has his own prefer method to debug, trace problem. I say >nothing at it. Certainly. I just offered mine to try and help this situation and kick ideas around. > >But how many guys have read this document before using pjsua API? I >don't think so. Nope I certainly don't recall reading that document. > >I will close my mouth if you don't like some advice. > >I tried Dual Quard Core Xeon server and 1000 cps, 20000 channels. Is >it enough? Ok lets talk. 20000 channels. I don't get anywhere near that number. Now part of the issue is my app is fully multithreaded. And my testing on windows xp/xp x64 indicates windows will give you new thread allocations but simply fail to give them a timeslice inytime soon. And that test was totally outside pjsip just to see where threading limits where. The limit is like 3100 threads. You can go higher with a smaller default stack. The default stack for windows is 1 meg so that becomes a issue in and of itself. But alas I have not been able to duplicate the result of going higher in thread count with a smaller default stack size. The limit I saw there was 13,000 threads. But now for my question on your test. You do 1000 cps which I assume is calls per second? call duration must be short and not much happening after the call is established. My basic hammer test is to do this: Box A Box B makecall to Box B receive call record call to Box B makecall to Box A as a separate call Play a 3 minute message bridge incoming and outbound calls record the conversation. This is all done with heavy use of the pjsua media library. And I am not able to go over 93 callers on either box. Now I know the pjsua media library is the limitation. But take that away and I am lost. My question to you is have you come up with a way to do audio functionality without the pjsua media library? > >I know may be it is difference between pjsip-ua api and pjsua >api.But The key is our own program logic design must follow this >guide.And spend enough time to understand why we need follow this >guide. Then the world will be more easy. > >regards, >Gang >On Wed, May 20, 2009 at 3:26 PM, M.S. ><<mailto:hamstiede at yahoo.de>hamstiede at yahoo.de> wrote: >Hello Gang, > >to read the wiki guideline is the first step to resolve the >problems, but if you have a multi threaddding/hyper >thredding/multi-processor system and many connection for one agent, >the race conditions will increase. >i think i read the whole pjsip mail archive, and every month i heard >about deadlock situations, infinity loops e.g. >In my own applications i often use __FILE__ and __LINE__ debug >outputs for mutex systems. it works fine !!! > > > >regards > mark > > > >Von: Gang Liu <<mailto:gangban.lau at gmail.com>gangban.lau at gmail.com> >An: pjsip list <<mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org> >Gesendet: Mittwoch, den 20. Mai 2009, 05:30:51 Uhr >Betreff: Re: [pjsip] debug tips for cpu usage going wrong, deadlock >issues, within pjsip or not. win32 specific thread profiling function used. > ><http://trac.pjsip.org/repos/wiki/PJSUA_Locks>http://trac.pjsip.org/repos/wiki/PJSUA_Locks > >Is this guide line useful? >Indeed, it is still possible to create some SIP message flow to >cause PJSUA or PJSIP-UA deadlock.But this is another topic. > >regards, >Gang > >On Mon, May 18, 2009 at 4:11 PM, M.S. ><<mailto:hamstiede at yahoo.de>hamstiede at yahoo.de> wrote: >thank you for your detailed debug summary. I am using linux but >__FILE__ and __LINE__ is working too. >For the thread tagging, i will use the linux pids (process-id). > >I thought i have a deadlock because: >- if i have 1 or 2 connections(calls) i have < 5 % processor use >(small arm system). >-if i have 4 connections if get (only for one thread) 80-100% >processor use. (after a view seconds). >(for each media stream i use a separate conference bridge. all this >threads will used only (<2% cpu)) >- after this i get some deadlock informations form the stack like: > >possible deadlock ........ > >Questions: >it is possible that this could happened if any stack state machine >checks his own mutex like "bool IsPjsuaLocked()" ? > >I think i will build this stuff your wrote in the pjlib (thread >tagging and detaild mutex logging with __FILE__ and __LINE__) > > >thank you > > Mark > > > >Von: David Clark <<mailto:vdc1048 at tx.rr.com>vdc1048 at tx.rr.com> >An: pjsip list ><<mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org>; pjsip list ><<mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org> >Gesendet: Samstag, den 16. Mai 2009, 09:08:42 Uhr >Betreff: [pjsip] debug tips for cpu usage going wrong, deadlock >issues, within pjsip or not. win32 specific thread profiling function used. > >No 100% might well be a bug but most likely not deadlock. deadlock >is where you are locked by the OS waiting for a mutex you will never get. >This locked state uses 0% cpu. The usual symptom I see is after x >number of calls. pjsip just does not communicate with anyone. Even SJPhone >the sip phone application is ignored. > >But how to resolve your cpu goes to 100% issue. Yea not putting >sleep(1) in some of your loops might be the cause of it. But if you >have a multithread >program in windows that is using 100% of the cpu and you don't know >why. I got some tricks to tell you where. Tell you which thread is using >that cpu and rule out others. > >The solution is to create a linked list of every thread in the box >yorus and pjsip threads. You can do this with a wrapper function >around CreateThread() >for your stuff. You can do this by having pjsip's win32 create >thread function call a function which simply adds the thread pjsip >created to your linked list. > >Ok got your linked list. Next step. >At regular intervals like say once a minute call a function which >for each thread in your linked list calls this function: >/************************************************************************/ >/* */ >/* Given a thread handle this function will determine the percent of */ >/* time the thread has spent in user mode and kernal mode. */ >/* */ >/************************************************************************/ >int thread_percent(HANDLE thread, int *user_mode, int *kernel_mode) >{ > __int64 user_percent; > __int64 kernel_percent; > > LARGE_INTEGER process_utime; > LARGE_INTEGER process_ktime; > LARGE_INTEGER thread_utime; > LARGE_INTEGER thread_ktime; > > FILETIME process_creation_time; > FILETIME process_exit_time; > FILETIME process_user_time; > FILETIME process_kernel_time; > > FILETIME thread_creation_time; > FILETIME thread_exit_time; > FILETIME thread_user_time; > FILETIME thread_kernel_time; > > BOOL retval1, retval2; > int retval=0; // assume failure. > > > retval1=GetProcessTimes(GetCurrentProcess(), > &process_creation_time, > &process_exit_time, > &process_kernel_time, > &process_user_time); > > retval2=GetThreadTimes(thread, > &thread_creation_time, > &thread_exit_time, > &thread_kernel_time, > &thread_user_time); > > if ((retval1) && (retval2)) // if both functions worked. > { > memcpy(&process_utime, &process_user_time, sizeof(FILETIME)); > memcpy(&process_ktime, &process_kernel_time, sizeof(FILETIME)); > memcpy(&thread_utime, &thread_user_time, sizeof(FILETIME)); > memcpy(&thread_ktime, &thread_kernel_time, sizeof(FILETIME)); > > if (process_utime.QuadPart==0) > user_percent=0; > else > >user_percent=thread_utime.QuadPart*100/process_utime.QuadPart; > if (process_ktime.QuadPart==0) > kernel_percent=0; > else > >kernel_percent=thread_ktime.QuadPart*100/process_ktime.QuadPart; > > *user_mode=(int)user_percent; > *kernel_mode=(int)kernel_percent; > retval=1; > } > return(retval); >} >You will find most threads will be 0 and 0 and one will be >significantly higher say 30 and 40. The one that is significantly >higher is the offender. > >Hope this helps and makes sense. > >On the deadlock debugging I did do. I did this. I moved pjsip >debug level at compile time and runtime to 6. >But that was still not enough to tell me what I needed to know. So >I augumented the mutex lock/unlock/try/destroy functions in pjilib >with a version that gave the callers __FILE__, and >__LINE__. __FILE__ gives the source module of the caller, and >__LINE__ gives the >line number the function was called from. > >I did something like this: >#define mutex_lock(lock) _mutex_lock(lock, __FILE__, __LINE__) > >_mutex_lock(multex *lock, char *file, int line) >{ > // in here I added file, and line to the level 6 debug output. >} > >Run that and you know where you blocked and where you locked prior >to that which gives the complete story. > >David Clark >ps. Note this does produce a ton of debug logs. We are talking >gigabytes. In my applications I have been able to >give mutex locking information as part of the thread usage >reports. I list what thread is blocked on which mutex >where the block happened and where ownership was last successfully >obtained. The complete story on one line. >I have not put this into pjsip yet. It involved replacing all mutex >functions, and since my application never uses >a try mutex function, I don't have that one. And pjsip uses that >heavily. So I would need to add that function. > >At 02:54 AM 5/15/2009, M.S. wrote: >>i think you have great skills to debug deadlock situations in pjsip. >>can you give me a good advice to debug pjsip/pjsua mutex stuff >>(log-level, debug outputs etc.) >> >>i have a situation that pjsip got 100% processor use. i think i >>have a deadlock. >> >> regards >> mark >> >> >>Von: David Clark <<mailto:vdc1048 at tx.rr.com>vdc1048 at tx.rr.com> >>An: pjsip list >><<mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org>; pjsip list >><<mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org> >>Gesendet: Donnerstag, den 14. Mai 2009, 22:22:56 Uhr >>Betreff: [pjsip] pjsip version 1.1 gotcha #2 with work around. >> >>This is similar to gotcha #1. So I will go into less detail. If >>the on_call_state() handler which gets hangup >>notifications calls pjsua_recorder_destroy(). The box can stop >>communicating after that for the following reason. >>1) pjsua_recorder_destroy() locks pjsua mutex. >>2) pjsua_recorder_destroy() locals conf mutex indirectly by calling >>pjsua_conf_remove_port(). >> >>But if conf mutex is locked in port audio clock thread get_frame() >>at the point of the call. >> >>You will block waiting for conf mutex and other threads will never >>get pjsua mutex. >> >>Work around: don't call pjsua_recorder_destroy() in on_call_state() >>signal the application thread to wait up and do it >>in the application thread. >> >>David Clark >> >>_______________________________________________ >>Visit our blog: <http://blog.pjsip.org/>http://blog.pjsip.org >> >>pjsip mailing list >><mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org >>http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org > > >_______________________________________________ >Visit our blog: <http://blog.pjsip.org/>http://blog.pjsip.org > >pjsip mailing list ><mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org >http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org > > > > >_______________________________________________ >Visit our blog: <http://blog.pjsip.org/>http://blog.pjsip.org > >pjsip mailing list ><mailto:pjsip at lists.pjsip.org>pjsip at lists.pjsip.org >http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org > > >_______________________________________________ >Visit our blog: http://blog.pjsip.org > >pjsip mailing list >pjsip at lists.pjsip.org >http://lists.pjsip.org/mailman/listinfo/pjsip_lists.pjsip.org -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.pjsip.org/pipermail/pjsip_lists.pjsip.org/attachments/20090520/c28bd886/attachment-0001.html>