On Fri, 20 Mar 2009, M. Edward (Ed) Borasky wrote:
I just discovered this on a LinkedIn user group: http://bugzilla.kernel.org/show_bug.cgi?id=12309
I would bet there's at least 3 different bugs in that one. That bug report got a lot of press via Slashdot a few months ago, and it's picked all sort of people who all have I/O wait issues, but they don't all have the same cause. The 3ware-specific problem Laurent mentioned is an example. That's not the same thing most of the people there are running into, the typical reporter there has disks attached directly to their motherboard. The irony here is that #12309 was a fork of #7372 to start over with a clean discussion slat because the same thing happened to that earlier one.
The original problem reported there showed up in 2.6.20, so I've been able to avoid this whole thing by sticking to the stock RHEL5 kernel (2.6.18) on most of the production systems I deal with. (Except for my system with an Areca card--that one needs 2.6.22 or later to be stable, and seems to have no unexpected I/O wait issues. I think this is because it's taking over the lowest level I/O scheduling from Linux, when it pushes from the card's cache onto the disks).
Some of the people there reported significant improvement by tuning the pdflush tunables; now that I've had to do a few times on systems to get rid of unexpected write lulls. I wrote up a walkthrough on one of them at http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html that goes over how to tell if you're running into that problem, and what to do about it; something else I wrote on that already made it into the bug report in comment #150.
-- * Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance