Re: Bad performance with XFS + 2.6.38 / 2.6.39

Yann Dupont <Yann.Dupont@xxxxxxxxxxxxxx> · Wed, 04 Jan 2012 11:54:43 +0100

On 02/01/2012 17:08, Peter Grandi wrote:
[ ... ]

On two particular server, with recent kernels, I experience a
much higher load than expected, but it's very hard to tell
what's wrong. The system seems more in I/O wait. Older
kernels (2.6.32.xx and 2.6.26.xx) gives better results.
[ ... ]
When I go back to older kernels, the load go down. With newer
kernel, all is working well too, but load (as reported by
uptime) is higher.
[ ... ]
birnie:~/TRACE# uptime
   11:48:34 up 17:18,  3 users,  load average: 0.04, 0.18, 0.23

penderyn:~/TRACE# uptime
   11:48:30 up 23 min,  3 users,  load average: 4.03, 3.82, 3.21
[ ... ]

But 'uptime' reports the load average, which is (roughly)
processes actually running on the CPU. If the load average is

More or less. I generally have 5000+ processes on those servers. The 
load generally reflect a mix between CPU usage (which is unchanged as 
dovecot setup is unchanged) and I/O wait. So naively, I'll say if load 
average is higher than usual, that's because I/O WAIT is higher.

As kernel had big changes, it could be XFS, but DM, or I/O scheduler as 
well.

But it don't seems the case.

higher, that usually means that the file system is running
better, not worse.

If delivery is I/O bound, yes but that's not the case in this particular 
setup.

 It looks as if you are not clear whether you
have a regression or an improvement.

I was just signaling an unusual load average, nothing else. As far as I 
can see, response times are still correct. I'm not experiencing a 
performance proble. I'm not the first author of the thread. I probably 
should have changed the name of the thread, sorry for that.

For a mail server the relevant metric is messages processed per
second, or alternatively median and maximum times to process a
message, rather than "average" processes running.

...
So you are expecting for a large system critical problem for
which you yourself do not have the resource to do testing to see
quick response times over the Christmas and New Year period.
What's your XFS Platinum Psychic Support Account number? :-)

I'm not expecting anything. I know open source. All is working fine, 
thank you. I was just "upping" because I saw that my traces have been 
downloaded last week. It's not always easy for non native speakers to 
send mails without sounding agressive/offendant . If that was the case,I 
can assure that was not the intent.

BTW rereading the description of the setup:

Thoses servers are mail (dovecot) servers, with lots of
simultaneous imap clients (5000+) an lots of simultaneous
message delivery. These are linux-vservers, on top of LVM
volumes. The storage is SAN with 15k RPM SAS drives (and
battery backup). I know barriers were disabled in older
kernels, so with recents kernels, XFS volumes were mounted
with nobarrier.

1. What mailbox format are you using?  Is this a constant
or variable?
Maildir++

I am stunned by the sheer (euphemism alert) audacity of it all.
This setup is (euphemism alert) amazing.

Can you elaborate, please ?? This particular setup is running fine for 7 
years now , has very finely scaled up (up to 70k mailboxes with a 
similar setup for students) with little modifications (replacing 
courrier by dovecot, and upgrading servers for example) and has proved 
very stable since, despite numerous power outages, for example...

I can give you detailed setup if you want, off list, I think it has 
nothing to do with xfs.

Unfortunately the problem of large busy mailstores is vastly
underestimated by many, and XFS has little to do with it.

really not sure I underestimate it, but I'll glad to hear your 
recommendations. Offlist, I think.

Cheers,

--
Yann Dupont - Service IRTS, DSI Université de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@xxxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs