[ ... ] >> On two particular server, with recent kernels, I experience a >> much higher load than expected, but it's very hard to tell >> what's wrong. The system seems more in I/O wait. Older >> kernels (2.6.32.xx and 2.6.26.xx) gives better results. [ ... ] > When I go back to older kernels, the load go down. With newer > kernel, all is working well too, but load (as reported by > uptime) is higher. [ ... ] >> birnie:~/TRACE# uptime >> 11:48:34 up 17:18, 3 users, load average: 0.04, 0.18, 0.23 >> penderyn:~/TRACE# uptime >> 11:48:30 up 23 min, 3 users, load average: 4.03, 3.82, 3.21 [ ... ] But 'uptime' reports the load average, which is (roughly) processes actually running on the CPU. If the load average is higher, that usually means that the file system is running better, not worse. It looks as if you are not clear whether you have a regression or an improvement. For a mail server the relevant metric is messages processed per second, or alternatively median and maximum times to process a message, rather than "average" processes running. [ ... ] >> As those servers are critical for us, I can't really test, >> hardly give you more precise numbers, and I don't know how to >> accurately reproduce this platform to test what's wrong. I >> know this is NOT a precise bug report and it won't help much. >> All I can say IS : - read operations seems no slower with >> recent kernels, backups take approximatively the same time ; >> - I'd say (but I have no proof) that delivery of new mails >> takes more time and is more synchronous than before, like >> nobarrier have no effect. > Did someone had time to examine the 2 blktrace ? (and, by > chance, can see the root cause of the increased load ?) So you are expecting for a large system critical problem for which you yourself do not have the resource to do testing to see quick response times over the Christmas and New Year period. What's your XFS Platinum Psychic Support Account number? :-) > One of my server is still running 3.1.6. In the coming days > I'll see a very important load increase (today is still > calm). Is there anything I can do to go further ? As it is not clear whether you are complaining about better XFS performance, it is hard to help. However you can probably test a bit your systems by running while things are still calmer Postmark on both machines, as that reports relevant metrics. [ ... ] BTW rereading the description of the setup: >>>>> Thoses servers are mail (dovecot) servers, with lots of >>>>> simultaneous imap clients (5000+) an lots of simultaneous >>>>> message delivery. These are linux-vservers, on top of LVM >>>>> volumes. The storage is SAN with 15k RPM SAS drives (and >>>>> battery backup). I know barriers were disabled in older >>>>> kernels, so with recents kernels, XFS volumes were mounted >>>>> with nobarrier. >>>> 1. What mailbox format are you using? Is this a constant >>>> or variable? >>> Maildir++ I am stunned by the sheer (euphemism alert) audacity of it all. This setup is (euphemism alert) amazing. However at least it is Linux-VServers, while there are clueless sysadms who setup mail servers over virtual machines (and amazingly VMware encourages that for Zimbra, which is a terrible combination as Zimbra also uses something like Maildir for the IMAP mailstore). The use of 15k drives is also commendable. Unfortunately the problem of large busy mailstores is vastly underestimated by many, and XFS has little to do with it. _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs