On Sat, Nov 25, 2017 at 05:45:59PM -0500, Tom Lane wrote: > Justin Pryzby <pryzby@xxxxxxxxxxxxx> writes: > > We never had any issue during the ~2 years running PG96 on this VM, until > > upgrading Monday to PG10.1, and we've now hit it 5+ times. > > > BTW this is a VM run on a hypervisor managed by our customer: > > DMI: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 06/22/2012 > > > Linux TS-DB 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux > > Actually ... I was focusing on the wrong part of that. It's not > your hypervisor, it's your kernel. Running four-year-old kernels > is seldom a great idea, and in this case, the one you're using > contains the well-reported missed-futex-wakeups bug: > > https://bugs.centos.org/view.php?id=8371 > > While rebuilding PG so it doesn't use POSIX semaphores will dodge > that bug, I think a kernel update would be a far better idea. > There are lots of other known bugs in that version. > > Relevant to our discussion, the fix involves inserting a memory > barrier into the kernel's futex call handling: Ouch ! Thanks for the heads up and sorry for the noise. I'm still trying to coax 3 customers off centos5.x, so the 2 customers left running centos6.5 weren't on any of my mental lists.. Justin