Tom Lane wrote:
We're seeing an average of 30,000 context-switches a sec. This problem
was much worse w/8.0 and got bearable with 8.1 but slowly resurfaced.
Is this from LWLock or spinlock contention? strace'ing a few backends
could tell the difference: look to see how many select(0,...) you see
compared to semop()s. Also, how many of these compared to real work
(such as read/write calls)?
Over a 20 second interval, I've got about 85 select()s and 6,230
semop()s. 2604 read()s vs 16 write()s.
Do you have any long-running transactions, and if so does shutting
them down help? There's been some discussion about thrashing of the
pg_subtrans buffers being a problem, and that's mainly a function of
the age of the oldest open transaction.
Not long-running. We do have a badly behaving legacy app that is
leaving some backends "idle in transaction" They're gone pretty quickly
so I can't kill them fast enough, but running a pg_stat_activity will
always show at least a handful. Could this be contributing?
Based on the number of semop's we're getting it does look like
shared_memory may be getting thrased - any suggestions? We did try
lowering shared_memory usage in half the previous day, but that did
little to help (it didn't make performance any worse and we still saw
the high context-switches, but it didn't make it any better either).
--
Sumbry][