2009/3/22 Greg Smith <gsmith@xxxxxxxxxxxxx>: > On Fri, 20 Mar 2009, M. Edward (Ed) Borasky wrote: > >> I just discovered this on a LinkedIn user group: >> http://bugzilla.kernel.org/show_bug.cgi?id=12309 > > I would bet there's at least 3 different bugs in that one. That bug report > got a lot of press via Slashdot a few months ago, and it's picked all sort > of people who all have I/O wait issues, but they don't all have the same > cause. The 3ware-specific problem Laurent mentioned is an example. That's > not the same thing most of the people there are running into, the typical > reporter there has disks attached directly to their motherboard. The irony > here is that #12309 was a fork of #7372 to start over with a clean > discussion slat because the same thing happened to that earlier one. That I/O wait problem is not 3ware specific. A friend of mine has the same problem/fix with aacraid. I'd bet a couple coins that controllers that show this problem do not set mwi. quickly grepping linux sources (2.6.28.8) for pci_try_set_mwi: (only disks controllers showed here) 230:pata_cs5530.c 3442:sata_mv.c 2016:3w-9xxx.c 147:qla_init.c 2412:lpfc_init.c 171:cs5530.c > > The original problem reported there showed up in 2.6.20, so I've been able > to avoid this whole thing by sticking to the stock RHEL5 kernel (2.6.18) on > most of the production systems I deal with. (Except for my system with an > Areca card--that one needs 2.6.22 or later to be stable, and seems to have > no unexpected I/O wait issues. I think this is because it's taking over the > lowest level I/O scheduling from Linux, when it pushes from the card's cache > onto the disks). I thought about completely fair scheduler at first, but that one came in around 2.6.21. some tests were done with different I/O scheduler, and they do not seem to be the real cause of I/O wait. A bad interaction between hard raid cards cache and system willing the card to write at the same time could be a reason. unfortunately, I've met it with a now retired box at work, that was running a single disk plugged on the mobo controller. So, there's something else under the hood...but my (very) limited kernel knowledge can't help more here. > > Some of the people there reported significant improvement by tuning the > pdflush tunables; now that I've had to do a few times on systems to get rid > of unexpected write lulls. I wrote up a walkthrough on one of them at > http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html that > goes over how to tell if you're running into that problem, and what to do > about it; something else I wrote on that already made it into the bug report > in comment #150. I think that forcing the system to write down more often, and smaller data just hides the problem, and doesn't correct it. But well, that's just feeling, not science. I hope some real hacker will be able to spot the problem(s) so they can be fixed. anyway, I keep a couple coins on mwi as a source of problem :-) Regards, Laurent -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance