Re: Disappointing performance of copy (MD raid + XFS)

Asdo <asdo@xxxxxxxxxxxxx> · Fri, 11 Dec 2009 02:41:32 +0100

Eric Sandeen wrote:
Gabor Gombas wrote:
Kristleifur Daðason wrote:
[CUT]

Thank you guys for your help

I have done further investigation.

I still have not checked how performances are with very small files and 
multiple simultaneous rsyncs.

I have checked the other problem I had which I was mentioning, that I 
couldn't go more than 150MB/sec even with large files and multiple 
simultaneous transfers.
I confirm this one and I have narrowed the problem: two XFS defaults 
(optimizations) actually damage the performances.

The first and most important is the aligned writes: cat /proc/mounts 
lists this (autodetected) stripe size: "sunit=2048,swidth=28672" . My 
chunks are is 1MB and I have 16 disks in raid-6 so 14 data disks. Do you 
think it's correct? xfs_info lists blocks as 4k and sunit and swidth are 
in 4k blocks and have a very different value. Please do not use the same 
name "sunit"/"swidth" to mean 2 different things in 2 different places, 
it can confuse the user (me!)

Anyway that's not the problem: I have tried to specify other values in 
my mount (in particular I tried the values sunit and swidth should have 
had if blocks were 4k), but ANY xfs aligned mount kills the performances 
for me. I have to specify "noalign" in my mount to go fast. (Also note 
this option cannot be changed on mount -o remount. I have to unmount.)

The other default feature that kills performances for me is the 
rotorstep. I have to max it out at 255 in order to have good 
performances. Actually it is reasonable that a higher rotorstep should 
be faster... why is 1 the default? Why it even exists? With low values 
the await (iostat -x 1) increases, I guess because of the seeks, and 
stripe_cache_active stays higher, because there are less filled stripes.

If I use noalign and rotorstep at 255 I am able to go at 325 MB/sec on 
average (16 parallel transfers of 7MB files) while with defaults I go at 
about 90 MB/sec.

Also with noalign and rotorstep at 255 the stripe_cache_size stays 
usually in the lower half (below 16000 out of 32000) while with defaults 
it's stuck for most of the time at the maximum and processes are stuck 
sleeping in MD locks for this reason.

Do you have any knowledge of sunit/swidth alignment mechanism being 
broken on 2.6.31 or more specifically 2.6.31 ubuntu generic-14 ?

(Kristleifur thank you I have seen your mention of the Ubuntu vs vanilla 
kernel, I will try a vanilla one but right now I can't. However now I 
have narrowed the problem so XFS people might want to watch at the 
alignment problem more specifically)

Regarding my previous post I still would like to know what are those 
stack traces I posted in my previous post: what are the functions
xlog_state_get_iclog_space+0xed/0x2d0 [xfs]  
and
xfs_buf_lock+0x1e/0x60 [xfs]
and what are they waiting for...
these are still the place where processes get stuck, even after having 
worked around the alignment/rotorstep problem...

And then a few questions on inode64:
- if I start using inode64, do I have to remember to use inode64 on 
every subsequent mount for the life for that filesystem? Or does it 
write it in some filesystem info region that the option has been used 
once, so it applies the inode64 by itself on subsequent mounts?
- if I use a 64bit linux distro, will ALL userland programs 
automatically support 64bit inodes or do I have to continuously pay 
attention and risk to damage my data?

Thanks for your help
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html