Hi Steve, Thanks for your reply! > > ThreadA found the value of mutex->__data.__lock is another task > > ThreadB's tid. So it entered the linux kernel via system call. (the > > auto variable 'oldval' in __pthread_mutex_lock_full was stored in the > > stack) > > > > The linux kernel find the value mutex->__data.__lock is ThreadA itself > > in 'if ((unlikely((uval & FUTEX_TID_MASK) == vpid)))' in > > futex_lock_pi_atomic(), So return -EDEADLK. > > That sounds like the kernel found that ThreadB is blocked on something > owned by ThreadA which would be a deadlock. In fact, the kernel found the ThreadA locked itself, i.e. try to lock again when having occupied it. As you know, when using PTHREAD_MUTEX_PI_RECURSIVE_NP or PTHREAD_MUTEX_PI_ERRORCHECK_NP as the mutex's attribute, it's impossible for the ThreadA to enter the kernel if it has occupied the lock. But now it happened. According to the auto variable 'oldval', at that moment before entering the kernel, the occupier is ThreadB. So the value of mutex->__data.__lock seen from the glibc and from the kernel are inconsistent (B.T.W. the value is 0 seen from coredump). > > __pthread_mutex_lock_full() judge the return value and asserted. > > > > coredump file generated, and the value mutex->__data.__lock in the > > coredump file is 0. And the ThreadB is in the start of the entry > > function, for example waiting another message to be processed (i.e. > > has released the lock). > > $5 = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 33, > > __nusers = 0, {__spins = 0, __list = {__next = 0x0}}}, > > __size = '\000' <repeats 15 times>, "!\000\000\000\000\000\000\000", > > __align = 0} > > Are you saying that ThreadB isn't blocked on anything? Or could it be > possible that the crash of ThreadA released whatever ThreadB was > blocked on before ThreadB was taken out as well? Yes. According to the coredump, ThreadB isn't blocked on anything related to the lock. > Have you been able to try a newer kernel at all? I don't look at > anything less that 3.18, and even for 3.18, I try to avoid. I'm sorry to use a obsoleted version, but I indeed could not try a newer one. Because it's used by customers and it's very difficult to prove the new one has resolved the issue. Anyway, thanks for your suggestion! B.R. Yimin