Hi CVS, Sincerely appreciate for your effort on this issue! > > I could not image a scenario that lead to 3 different values on the > > same variable mutex->__data.__lock seen in 3 positions. > Is there a common pattern to these 3 different values? > - Do they look like memory bit-flips (if yes, then misbehaviour / > misconfigured HW?) For example, when ThreadA's tid is 0x1100, then the 'oldval' in the stack is 0x11c1 (ThreadB's tid on another cpu). When ThreadA's tid is 0x1000, then the 'oldval' is 0x1057. So it seems not bit-flip between these 2 different values. Which value do you mean is bit-flipped? > - Do they always contain the same pattern (if yes, then > buffer-overflow? Add "guard-bytes" around the struct and attempt to > reproduce the behaviour) At least on one occurrance, the variable at the address (&(mutex->__data.__lock) - 8) (a global variable save the value of sysconf(_SC_NPROCESSORS_ONLN)) is correct, i.e. not overwritten. (&(mutex->__data.__lock) - 4) is a *fill* (i.e. actually not used) and the value in coredump is 0. And according to the coredump, the mutex->__data.__kind (its address is behind the &(mutex->__data.__lock)) is also correct. I could not add guard-bytes in the struct pthread_mutex_t, beause the glibc is included in the toolchain provided by third party. > Slightly tangential. > Have you checked for any unpatched HW errata in CPU, cache controller, > memory controller > that can cause writes posted by the CPU to be lost in some rare scenarios? I'm asking for the latest version of the errata. I'll update you if there's any progress. B.R. Yimin