Justin Pryzby <pryzby@xxxxxxxxxxxxx> writes: > On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote: >> Could you try stracing next time? > I straced all the "startup" PIDs, which were all in futex, without exception: If you've got debug symbols installed, could you investigate the states of the LWLocks the processes are stuck on? My hypothesis about a missed memory barrier would imply that there's (at least) one process that's waiting but is not in the lock's wait queue and has MyProc->lwWaiting == false, while the rest are in the wait queue and have MyProc->lwWaiting == true. Actually chasing through the list pointers would be slightly tedious, but checking MyProc->lwWaiting, and maybe MyProc->lwWaitMode, in each process shouldn't be too hard. Also verify that they're all waiting for the same LWLock (by address). I recognize Andres' point that on x86 lock-prefixed instructions should be full memory barriers, and at least on my Linux machines, there do seem to be lock-prefixed instructions in the fast paths through sem_wait and sem_post. But the theory fits the reported evidence awfully well, and we have no other theory that fits at all. [ in an earlier post: ] > BTW this is a VM run on a hypervisor managed by our customer: > DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 06/22/2012 Hmm. Can't avoid the suspicion that that's relevant somehow. regards, tom lane