Re: Help me: deadlock reported in pthread_mutex_lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi CVS,

No related item found in the latest cpu errata.

It's difficult to reproduce the issue and to debug the glibc which
provided in binary by the third party. According to the coredump, it
should be free after a while. So I do a workaround: modify that
mutex's attribute to be PTHREAD_MUTEX_PI_NORMAL_NP (the mutex is
shared by two cores on SMP) and replace the pthread_mutex_lock with
'do pthread_mutex_timedlock while its return value is ETIMEOUT'. I
hope it will work. But it will take a long time to verify whether it's
OK.

Thanks again for your concern!

B.R.
Yimin

> Hi CVS,
>
> Sincerely appreciate for your effort on this issue!
>
> > > I could not image a scenario that lead to 3 different values on the
> > > same variable mutex->__data.__lock seen in 3 positions.
> > Is there a common pattern to these 3 different values?
> > - Do they look like memory bit-flips (if yes, then misbehaviour /
> > misconfigured HW?)
>
> For example, when ThreadA's tid is 0x1100, then the 'oldval' in the
> stack is 0x11c1 (ThreadB's tid on another cpu). When ThreadA's tid is
> 0x1000, then the 'oldval' is 0x1057. So it seems not bit-flip between
> these 2 different values. Which value do you mean is bit-flipped?
>
> > - Do they always contain the same pattern (if yes, then
> > buffer-overflow? Add "guard-bytes" around the struct and attempt to
> > reproduce the behaviour)
>
> At least on one occurrance, the variable at the address
> (&(mutex->__data.__lock) - 8) (a global variable save the value of
> sysconf(_SC_NPROCESSORS_ONLN)) is correct, i.e. not overwritten.
> (&(mutex->__data.__lock) - 4) is a *fill* (i.e. actually not used) and
> the value in coredump is 0. And according to the coredump, the
> mutex->__data.__kind (its address is behind the
> &(mutex->__data.__lock)) is also correct. I could not add guard-bytes
> in the struct pthread_mutex_t, beause the glibc is included in the
> toolchain provided by third party.
>
>
> > Slightly tangential.
> > Have you checked for any unpatched HW errata in CPU, cache controller,
> > memory controller
> > that can cause writes posted by the CPU to be lost in some rare scenarios?
>
> I'm asking for the latest version of the errata.
>
> I'll update you if there's any progress.
>
> B.R.
> Yimin



[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux