Re: Bad performance with XFS + 2.6.38 / 2.6.39

pg_xf2@xxxxxxxxxxxxxxxxxx (Peter Grandi) · Mon, 2 Jan 2012 16:08:44 +0000

[ ... ]

>> On two particular server, with recent kernels, I experience a
>> much higher load than expected, but it's very hard to tell
>> what's wrong. The system seems more in I/O wait. Older
>> kernels (2.6.32.xx and 2.6.26.xx) gives better results.
[ ... ]
> When I go back to older kernels, the load go down. With newer
> kernel, all is working well too, but load (as reported by
> uptime) is higher.
[ ... ]
>> birnie:~/TRACE# uptime
>>   11:48:34 up 17:18,  3 users,  load average: 0.04, 0.18, 0.23

>> penderyn:~/TRACE# uptime
>>   11:48:30 up 23 min,  3 users,  load average: 4.03, 3.82, 3.21
[ ... ]

But 'uptime' reports the load average, which is (roughly)
processes actually running on the CPU. If the load average is
higher, that usually means that the file system is running
better, not worse. It looks as if you are not clear whether you
have a regression or an improvement.

For a mail server the relevant metric is messages processed per
second, or alternatively median and maximum times to process a
message, rather than "average" processes running.

[ ... ]

>> As those servers are critical for us, I can't really test,
>> hardly give you more precise numbers, and I don't know how to
>> accurately reproduce this platform to test what's wrong. I
>> know this is NOT a precise bug report and it won't help much.
>> All I can say IS : - read operations seems no slower with
>> recent kernels, backups take approximatively the same time ;
>> - I'd say (but I have no proof) that delivery of new mails
>> takes more time and is more synchronous than before, like
>> nobarrier have no effect.

> Did someone had time to examine the 2 blktrace ? (and, by
> chance, can see the root cause of the increased load ?)

So you are expecting for a large system critical problem for
which you yourself do not have the resource to do testing to see
quick response times over the Christmas and New Year period.
What's your XFS Platinum Psychic Support Account number? :-)

> One of my server is still running 3.1.6. In the coming days
> I'll see a very important load increase (today is still
> calm). Is there anything I can do to go further ?

As it is not clear whether you are complaining about better XFS
performance, it is hard to help.

However you can probably test a bit your systems by running
while things are still calmer Postmark on both machines, as that
reports relevant metrics.

[ ... ]

BTW rereading the description of the setup:

>>>>> Thoses servers are mail (dovecot) servers, with lots of
>>>>> simultaneous imap clients (5000+) an lots of simultaneous
>>>>> message delivery. These are linux-vservers, on top of LVM
>>>>> volumes. The storage is SAN with 15k RPM SAS drives (and
>>>>> battery backup). I know barriers were disabled in older
>>>>> kernels, so with recents kernels, XFS volumes were mounted
>>>>> with nobarrier.

>>>> 1. What mailbox format are you using?  Is this a constant
>>>> or variable?
>>> Maildir++

I am stunned by the sheer (euphemism alert) audacity of it all.
This setup is (euphemism alert) amazing.

However at least it is Linux-VServers, while there are clueless
sysadms who setup mail servers over virtual machines (and
amazingly VMware encourages that for Zimbra, which is a terrible
combination as Zimbra also uses something like Maildir for the
IMAP mailstore). The use of 15k drives is also commendable.

Unfortunately the problem of large busy mailstores is vastly
underestimated by many, and XFS has little to do with it.

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs