Justin Pryzby <pryzby@xxxxxxxxxxxxx> writes: > For starters, I found that PID 27427 has: > (gdb) p proc->lwWaiting > $1 = 0 '\000' > (gdb) p proc->lwWaitMode > $2 = 1 '\001' To confirm, this is LWLockAcquire's "proc", equal to MyProc? If so, and if LWLockAcquire is blocked at PGSemaphoreLock, that sure seems like a smoking gun. > Note: I've compiled locally PG 10.1 with PREFERRED_SEMAPHORES=SYSV to keep the > service up (and to the degree that serves to verify that avoids the issue, > great). Good idea, I was going to suggest that. It will be very interesting to see if that makes the problem go away. > Would you suggest how I can maximize the likelyhood/speed of triggering that ? > Five years ago, with a report of similar symptoms, you said "You need to hack > pgbench to suppress the single initialization connection it normally likes to > make, else the test degenerates to the one-incoming-connection case" > https://www.postgresql.org/message-id/8896.1337998337%40sss.pgh.pa.us I don't think that case was related at all. My theory suggests that any contended use of an LWLock is at risk, in which case just running pgbench with about as many sessions as you have in the live server ought to be able to trigger it. However, that doesn't really account for your having observed the problem only during session startup, so there may be some other factor involved. I wonder if it only happens during the first wait for an LWLock ... and if so, how could that be? regards, tom lane