Re: Problem about very high Average Read/Write Request Time

Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> · Sun, 19 Oct 2014 16:16:56 -0500

On 10/18/2014 04:26 AM, quanjun hu wrote:
> Hi,
>    I am using xfs on a raid 5 (~100TB) and put log on external ssd device, the mount information is:
> /dev/sdc on /data/fhgfs/fhgfs_storage type xfs (rw,relatime,attr2,delaylog,logdev=/dev/sdb1,sunit=512,swidth=15872,noquota).
>   when doing only reading / only writing , the speed is very fast(~1.5G), but when do both the speed is very slow (100M), and high r_await(160) and w_await(200000).
>    1. how can I reduce average request time?
>    2. can I use ssd as write/read cache for xfs?

You apparently have 31 effective SATA 7.2k RPM spindles with 256 KiB chunk, 7.75 MiB stripe width, in RAID5.  That should yield 3-4.6 GiB/s of streaming throughput assuming no cable, expander, nor HBA limitations.  You're achieving only 1/3rd to 1/2 of this.  Which hardware RAID controller is this?  What are the specs?  Cache RAM, host and back end cable count and type?

When you say read or write is fast individually, but read+write is slow, what types of files are you reading and writing, and how many in parallel?  This combined pattern is likely the cause of the slowdown due to excessive seeking in the drives.

As others mentioned this isn't an XFS problem.  The problem is that your RAID geometry doesn't match your workload.  Your very wide parity stripe is apparently causing excessive seeking with your read+write workload due to read-modify-write operations.  To mitigate this, and to increase resiliency, you should switch to RAID6 with a smaller chunk.  If you need maximum capacity make a single RAID6 array with 16 KiB chunk size.  This will yield a 496 KiB stripe width, increasing the odds that all writes are a full stripe, and hopefully eliminating much of the RMW problem.

A better option might be making three 10 drive RAID6 arrays (two spares) with 32 KiB chunk, 256 KiB stripe width, and concatenating the 3 arrays with mdadm --linear.  You'd have 24 spindles of capacity and throughput instead of 31, but no more RMW operations, or at least very few.  You'd format the linear md device with

# mkfs.xfs -d su=32k,sw=8 /dev/mdX

As long as your file accesses are spread fairly evenly across at least 3 directories you should achieve excellent parallel throughput, though single file streaming throughput will peak at 800-1200 MiB/s, that of 8 drives.  With a little understanding of how this setup works, you can write two streaming files and read a third without any of the 3 competing with one another for disk seeks/bandwidth--which is your current problem.  Or you could do one read and one write to each of 3 directories, and no pair of two would interfere with the other pairs.  Scale up from here.

Basically what we're doing is isolating each RAID LUN into a set of directories.  When you write to one of those directories the file goes into only one of the 3 RAID arrays.  Doing this isolates RMWs for a given write to only a subset of your disks, and minimizes the amount of seeks generated by parallel accesses.

Cheers,
Stan

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs