andres@xxxxxxxxxxx (Andres Freund) writes: > On 2017-11-21 18:50:05 -0500, Tom Lane wrote: >> (If Justin saw that while still on 9.6, then it'd be worth looking >> closer.) > Right. I took this to be referring to something before the current > migration, but I might have overinterpreted things. There've been > various forks/ports of pg around that had hand-coded replacements with > futex usage, and there were definitely buggy versions going around a few > years back. Poking around in the archives reminded me of this thread: https://www.postgresql.org/message-id/flat/14947.1475690465@xxxxxxxxxxxxx which describes symptoms uncomfortably close to what Justin is showing. I remember speculating that the SysV-sema implementation, because it'd always enter the kernel, would provide some memory barrier behavior that POSIX-sema code based on futexes might miss when taking the no-wait path. I'd figured that any real problems of that sort would show up pretty quickly, but that could've been over optimistic. Maybe we need to take a closer look at where LWLocks devolve to blocking on the process semaphore and see if there's any implicit assumptions about barriers there. regards, tom lane