Re: Help me: deadlock reported in pthread_mutex_lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Steve,

Thanks for your reply!

> > ThreadA found the value of mutex->__data.__lock is another task
> > ThreadB's tid. So it entered the linux kernel via system call. (the
> > auto variable 'oldval' in __pthread_mutex_lock_full was stored in the
> > stack)
> >
> > The linux kernel find the value mutex->__data.__lock is ThreadA itself
> > in 'if ((unlikely((uval & FUTEX_TID_MASK) == vpid)))' in
> > futex_lock_pi_atomic(), So return -EDEADLK.
>
> That sounds like the kernel found that ThreadB is blocked on something
> owned by ThreadA which would be a deadlock.

In fact, the kernel found the ThreadA locked itself, i.e. try to lock
again when having occupied it. As you know, when using
PTHREAD_MUTEX_PI_RECURSIVE_NP or  PTHREAD_MUTEX_PI_ERRORCHECK_NP as
the mutex's attribute, it's impossible for the ThreadA to enter the
kernel if it has occupied the lock. But now it happened. According to
the auto variable 'oldval', at that moment before entering the kernel,
the occupier is ThreadB. So the value of mutex->__data.__lock seen
from the glibc and from the kernel are inconsistent (B.T.W. the value
is 0 seen from coredump).

> > __pthread_mutex_lock_full() judge the return value and asserted.
> >
> > coredump file generated, and the value mutex->__data.__lock in the
> > coredump file is 0. And the ThreadB is in the start of the entry
> > function, for example waiting another message to be processed (i.e.
> > has released the lock).
> > $5 = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 33,
> > __nusers = 0, {__spins = 0, __list = {__next = 0x0}}},
> >   __size = '\000' <repeats 15 times>, "!\000\000\000\000\000\000\000",
> > __align = 0}
>
> Are you saying that ThreadB isn't blocked on anything? Or could it be
> possible that the crash of ThreadA released whatever ThreadB was
> blocked on before ThreadB was taken out as well?

Yes. According to the coredump, ThreadB isn't blocked on anything
related to the lock.

> Have you been able to try a newer kernel at all? I don't look at
> anything less that 3.18, and even for 3.18, I try to avoid.

I'm sorry to use a obsoleted version, but I indeed could not try a
newer one. Because it's used by customers and it's very difficult to
prove the new one has resolved the issue. Anyway, thanks for your
suggestion!

B.R.
Yimin



[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux