I am copying here (without endorsement or comment) some extracts by different people from a thread in the XFS mailing list that seems mostly about RAID tuning: http://article.gmane.org/gmane.comp.file-systems.xfs.general/65029 >>> Hi, I am using xfs on a raid 5 (~100TB) and put log on external ssd >>> device, the mount information is: /dev/sdc on >>> /data/fhgfs/fhgfs_storage type xfs >>> (rw,relatime,attr2,delaylog,logdev=/dev/sdb1,sunit=512,swidth=15872,noquota). >>> when doing only reading / only writing , the speed is very >>> fast(~1.5G), but when do both the speed is very slow (100M), and >>> high r_await(160) and w_await(200000). >>> 1. how can I reduce average request time? >>> 2. can I use ssd as write/read cache for xfs? http://article.gmane.org/gmane.comp.file-systems.xfs.general/65034 >> There is a ratio of 31 (thirty one) between 'swidth' and 'sunit' and >> assuming that this reflects the geometry of the RAID5 set and given >> commonly available disk sizes it can be guessed that with amazing >> "bravery" someone has configured a RAID5 out of 32 (thirty two) high >> capacity/low IOPS 3TB drives, or something similar. It is even >> "braver" than that: if the device name "/data/fhgfs/fhgfs_storage" is >> dedscriptive, this "brave" RAID5 set is supposed to hold the object >> storage layer of a BeeFS highly parallel filesystem, and therefore >> will likely have mostly-random accesses. This issue should be moved >> to the 'linux-raid' mailing list as from the reported information it >> has nothing to do with XFS. http://article.gmane.org/gmane.comp.file-systems.xfs.general/65036 > You apparently have 31 effective SATA 7.2k RPM spindles with 256 KiB > chunk, 7.75 MiB stripe width, in RAID5. That should yield 3-4.6 GiB/s > of streaming throughput assuming no cable, expander, nor HBA > limitations. You're achieving only 1/3rd to 1/2 of this. Which > hardware RAID controller is this? What are the specs? Cache RAM, > host and back end cable count and type? > When you say read or write is fast individually, but read+write is > slow, what types of files are you reading and writing, and how many in > parallel? This combined pattern is likely the cause of the slowdown > due to excessive seeking in the drives. As others mentioned this isn't > an XFS problem. > The problem is that your RAID geometry doesn't match your workload. > Your very wide parity stripe is apparently causing excessive seeking > with your read+write workload due to read-modify-write operations. To > mitigate this, and to increase resiliency, you should switch to RAID6 > with a smaller chunk. If you need maximum capacity make a single > RAID6 array with 16 KiB chunk size. This will yield a 496 KiB stripe > width, increasing the odds that all writes are a full stripe, and > hopefully eliminating much of the RMW problem. > A better option might be making three 10 drive RAID6 arrays (two > spares) with 32 KiB chunk, 256 KiB stripe width, and concatenating > the 3 arrays with mdadm --linear. You'd have 24 spindles of capacity > and throughput instead of 31, but no more RMW operations, or at least > very few. You'd format the linear md device with > mkfs.xfs -d su=32k,sw=8 /dev/mdX > As long as your file accesses are spread fairly evenly across at > least 3 directories you should achieve excellent parallel throughput, > though single file streaming throughput will peak at 800-1200 MiB/s, > that of 8 drives. With a little understanding of how this setup > works, you can write two streaming files and read a third without any > of the 3 competing with one another for disk seeks/bandwidth--which > is your current problem. Or you could do one read and one write to > each of 3 directories, and no pair of two would interfere with the > other pairs. Scale up from here. > Basically what we're doing is isolating each RAID LUN into a set of > directories. When you write to one of those directories the file > goes into only one of the 3 RAID arrays. Doing this isolates RMWs > for a given write to only a subset of your disks, and minimizes the > amount of seeks generated by parallel accesses. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html