Gavin Hamill <gdh@xxxxxxxxxxxxx> writes: > On Fri, 07 Apr 2006 17:56:49 -0400 > Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >> This is not good. Did the semop storms coincide with visible >> slowdown? (I'd assume so, but you didn't actually say...) > Yes, there's a definate correlation here.. I attached truss to the > main postmaster.. > ... > And when I saw a flood of semop's for any particular PID, a second later > in the 'topas' process list would show that PID at a 100% CPU ... So apparently we've still got a problem with multiprocess contention for an LWLock somewhere. It's not the BufMgrLock because that's gone in 8.1. It could be one of the finer-grain locks that are still there, or it could be someplace else. Are you in a position to try your workload using PG CVS tip? There's a nontrivial possibility that we've already fixed this --- a couple months ago I did some work to reduce contention in the lock manager: 2005-12-11 16:02 tgl * src/: backend/access/transam/twophase.c, backend/storage/ipc/procarray.c, backend/storage/lmgr/README, backend/storage/lmgr/deadlock.c, backend/storage/lmgr/lock.c, backend/storage/lmgr/lwlock.c, backend/storage/lmgr/proc.c, include/storage/lock.h, include/storage/lwlock.h, include/storage/proc.h: Divide the lock manager's shared state into 'partitions', so as to reduce contention for the former single LockMgrLock. Per my recent proposal. I set it up for 16 partitions, but on a pgbench test this gives only a marginal further improvement over 4 partitions --- we need to test more scenarios to choose the number of partitions. This is unfortunately not going to help you as far as getting that machine into production now (unless you're brave enough to run CVS tip as production, which I certainly am not). I'm afraid you're most likely going to have to ship that pSeries back at the end of the month, but while you've got it it'd be awfully nice if we could use it as a testbed ... regards, tom lane