On Mon, Dec 05, 2011 at 01:50:58PM -0500, Paul Anderson wrote: > I've set up an software RAID-60 array composed of 7 software RAID6's, > each with 32k chunks, 18 devices total (16 data, 2 parity), and in > theory appropriate setup parameters according to a nice white paper > written by Christoph and presented this last summer at LinuxCon. > > My question is, if the mdraid and XFS are all configured properly, > would I expect to see any read operations when doing a write-only > test? I would have assumed that I would not, since XFS should write > stripe-aligned sets of data, and in theory nothing needs to be read > (no read-modify-write going on, I would think). That depends. What's your "write only" test? > The performance is great, but I'm wondering if I need to keep looking. If performance is great, then what's the problem? > > Thanks, > > Paul Anderson > > Here's the details for kernel 2.6.38.5: > > mdadm --detail /dev/md0 (md1, md2, md3, md4, md5, and md6 all the same) > /dev/md0: .... > Chunk Size : 32K > > /dev/md8 is the RAID0 that concatenates the above RAID6's, making a > single RAID60: > > mdadm --detail /dev/md8 > /dev/md8: .... > Chunk Size : 4096K (this is what the RAID0 container thinks, but > I ignore it for xfs) You should set the RAID0 chunk size to the stripe width of the underlying RAID6 volume (i.e. 512k). > xfs_info /exports/ > meta-data=/dev/md8 isize=256 agcount=204, agsize=268435448 blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=54698370048, imaxpct=1 > = sunit=8 swidth=1024 blks Because XFS has clearly not been configured correctly. You've given it a stripe unit of 32k (the RAID6 chunk size), and a width of 4MB (the RAID0 chunk size). What you are doing is aligning allocation to individual disks in the RAID6 volumes but the filesystem doesn't know what the stripe width of those volumes are so can't really align correctly to the RAID6 geometry. And because it is not set up as a sunit = 128 (512k), it can't align to the RAID0 on top of it correctly, either. You need to align all layers of the stack to each other so the filesystem has a consistent view of stripe unit and widths. In this configuration, the RAID0 really needs a chunk size of 512k to match the RAID6 stripe width. Then you can chose from two different valid alignments for the filesytsem - align to the underlying RAID6 or to the top level RAID0. If you have a small file intensive workload, then aligning to the RAID6 is probably best so that small files can pack full RAID6 stripe widths. If you have a bandwidth intensive workload, then aligning to the RAID0 is probaly best so that large writes are aligned to the full stripe width of the underlying RAID6 devices. Either way, you need to understand and test your workload to improve on whatever the default XFS settings give you. > I made the filesystem like this: > mkfs.xfs -L $(hostname) -l su=32768 -d su=32768,sw=128 /dev/md8 > > mount options: inode64,largeio,swalloc,delaylog,logbsize=256k,logbufs=8,noatime,nodiratime Why largeio,swalloc? Have you determined that you're actually getting hot disks in your array without it? FWIW, delaylog and logbufs are the default so you don't need to set them, and nodiratime is a subset of noatime, so you don't need to specify that, either. > I intended to make it with an external log, but forgot. So you've determined an internal log is a performance bottleneck for your workload? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs