Help me: deadlock reported in pthread_mutex_lock

Yimin Deng <yimin11.deng@xxxxxxxxx> · Thu, 14 Feb 2019 16:09:48 +0800




I encountered deadlock in glibc's pthread_mutex_lock as below:
'pthread_mutex_lock.c:314: __pthread_mutex_lock_full: Assertion `(e)
!= 45 || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind !=
PTHREAD_MUTEX_RECURSIVE_NP)' failed.'

glibc: 2.16
linux: 3.10.87-rt80-Cavium-Octeon
arch: MIPS

ThreadA called __pthread_mutex_lock_full(mutex). The type of mutex is
PTHREAD_MUTEX_PI_RECURSIVE_NP or  PTHREAD_MUTEX_PI_ERRORCHECK_NP.

ThreadA found the value of mutex->__data.__lock is another task
ThreadB's tid. So it entered the linux kernel via system call. (the
auto variable 'oldval' in __pthread_mutex_lock_full was stored in the
stack)

The linux kernel find the value mutex->__data.__lock is ThreadA itself
in 'if ((unlikely((uval & FUTEX_TID_MASK) == vpid)))' in
futex_lock_pi_atomic(), So return -EDEADLK.

__pthread_mutex_lock_full() judge the return value and asserted.

coredump file generated, and the value mutex->__data.__lock in the
coredump file is 0. And the ThreadB is in the start of the entry
function, for example waiting another message to be processed (i.e.
has released the lock).
$5 = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 33,
__nusers = 0, {__spins = 0, __list = {__next = 0x0}}},
  __size = '\000' <repeats 15 times>, "!\000\000\000\000\000\000\000",
__align = 0}

ThreadA and ThreadB belong to the same process, but run on different cpus (SMP).


To debug this issue, i add printing in the kernel, and it indicates
the ThreadA deadlocked itself. The displayed uaddr is
&(mutex->__data.__lock).
@@ -997,8 +1093,13 @@ static int futex_lock_pi_atomic(u32 __us
  /*
  * Detect deadlocks.
  */
- if ((unlikely((uval & FUTEX_TID_MASK) == vpid)))
+ if ((unlikely((uval & FUTEX_TID_MASK) == vpid))) {
+ printk(KERN_ERR "uaddr:%p, uval:%u, vpid:%u,
task:%s(%d),prio:%d,normal:%d, current:%s(%d),prio:%d,normal:%d\n",
uaddr, (unsigned)uval, (unsigned)vpid, task->comm, task_pid_nr(task),
task->prio, task->normal_prio, current->comm, task_pid_nr(current),
current->prio, current->normal_prio);
+ show_stack(task, NULL);
+ if (current != task)
+ show_stack(current, NULL);
  return -EDEADLK;
+ }

Fragment in __pthread_mutex_lock_full():
int newval = id;
#ifdef NO_INCR
newval |= FUTEX_WAITERS;
#endif
oldval = atomic_compare_and_exchange_val_acq (&mutex->__data.__lock,
      newval, 0);

if (oldval != 0)
  {
    /* The mutex is locked.  The kernel will now take care of
       everything.  */
    int private = (robust
   ? PTHREAD_ROBUST_MUTEX_PSHARED (mutex)
   : PTHREAD_MUTEX_PSHARED (mutex));
    INTERNAL_SYSCALL_DECL (__err);
    int e = INTERNAL_SYSCALL (futex, __err, 4, &mutex->__data.__lock,
      __lll_private_flag (FUTEX_LOCK_PI,
  private), 1, 0);


0x77f4ed98 <+616>: sw zero,0(sp)
0x77f4ed9c <+620>: ll v1,0(s0)    //v1: mutex->__data.__lock  (Load
linked (LL) and store conditional (SC))
0x77f4eda0 <+624>: bnez v1,0x77f4edb4 <__pthread_mutex_lock_full+644>
0x77f4eda4 <+628>: move at,s1
0x77f4eda8 <+632>: sc at,0(s0)    //mutex->__data.__lock = current task's tid
0x77f4edac <+636>: beqz at,0x77f4ed9c <__pthread_mutex_lock_full+620>
0x77f4edb0 <+640>: nop
0x77f4edb4 <+644>: beqz v1,0x77f4eea0 <__pthread_mutex_lock_full+880>
0x77f4edb8 <+648>: sw v1,0(sp)
0x77f4edbc <+652>: bnez a4,0x77f4edcc <__pthread_mutex_lock_full+668>
0x77f4edc0 <+656>: li v0,128
0x77f4edc4 <+660>: lw v0,12(s0)


I could not image a scenario that lead to 3 different values on the
same variable mutex->__data.__lock seen in 3 positions.
It's very difficult to reproduce this issue (About 1 ~ several months
for 1 reproducing). And we failed to reproduce it using small
application.

Any help is welcome!

B.R.
Yimin