On 2019-12-11 3:16 p.m., Helge Deller wrote: > Up to now we tried to optimize the ldcw usage by using the coherent > completer of this command, which operates on the cache (instead of > memory) and thus might speed up things, and which was enabled by default > on our 64bit kernel build. > > But we still see runtime locking problems, so this patch changes it back > to use ldcw for 32- and 64-bit kernels, and live-patches it at runtime > to use the coherent completer when running on a uniprocessor machine. I'm not convinced this is the problem. Nominally, every PA 2.0 machine that we support is coherent. Is there evidence that this actually helps? I did a test where I switched "ldcw,co" to "ldcw" and didn't find a significant difference. So, I left the default assumption that most PA 2.0 machines are coherent in gcc. I'm seeing different behavior for pthread_mutex_lock/pthread_mutex_unlock with different glibc versions. The locking issues also seem to vary from one kernel version to the next. I don't know that we can blame the two build failures of acl2_8.1dfsg-6 on phantom on a locking issue, but phantom failed twice at the same pwasoint. In both cases, cc1 terminated with a segmentation fault. Yet, mx3210 has been chugging away for more than a day on the package. It also built -4 and -5. I don't have a clue what's really wrong but I suspect the slowness of our locking infrastructure is what exposes these issues. I've seen one issue in user space where a pointer to a mutex got corrupted in apt-cacher-ng. If I remember correctly, the LWS locking code was spinning with a pointer value of 0x12. I think the code should have faulted but the thread stuck. Had to systemctl restart apt-cacher-ng. Dave -- John David Anglin dave.anglin@xxxxxxxx