On Thu, 13 Sep 2007, Brad Nicholson wrote:
One big difference though is that the old array had 16 GB of cache, the
new one has 4 GB.
We have enough IO to spare that we have the bgwriter cranked up pretty
high, dirty buffers are getting quickly.
If your system is very active, running the bgwriter very aggressively will
result in the cache on the disk array being almost continuously filled
with pending writes that then trickle their way onto real disk eventually.
If the typical working set on this system results in the amount of
background writer cached writes regularly being >4GB but <16GB, that would
explain what you're seeing. The resolution is actually unexpected by most
people: you make the background writer less aggressive so that it's
spewing less redundant writes clogging the array's cache, leaving more
cache to buffer the actual commits so they don't block. This will
increase the odds that there will be a checkpoint block instead, but if
you're seeing none of them right now you may have some margin there to
reduce the BGW activity without aggrevating checkpoints.
Since you're probably not monitoring I/O waits and similar statistics on
how the disk array's cache is being used, whether this is happening or not
to you won't be obvious from what the operating system is reporting. One
or two blocked writes every couple of minutes won't even show up on the
gross statistics for operating system I/O waits; on average, they'll still
be 0. So it's possible you may think you have plenty of I/O to spare,
when in fact you don't quite have enough--what you've got is enough cache
that the OS can't see where the real I/O bottleneck is.
I'd be curious to see how you've got your background writer configured to
see if it matches situations like this I've seen in the past. The
parameters controlling the all scan are the ones you'd might consider
turning down, definately the percentage and possibly the maxpages as well.
--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster