On Mon, Nov 11, 2013 at 06:25:13PM +0100, Bernd Schubert wrote: > Hi all, > > for streaming writes onto a raid6 the current round-robin ag > selection seems does not seem to be optimal. Writing 4 files from 4 > threads into a single directory we get 900 MB/s, IOWs, writing all 4 files into the same AG, interleaving them in to the same physical location on disk. > writing 4 files in > 4 different directories we only get 700 MB/s (12 disks with with hw > megaraid-sas). And that writes the 4 files into 4 different AGs, separating them into physically different regions of the disk. There's seeks between the streams there, and often cheap RAID controllers have problems with internal caching algorithms being unable to minimise seeks between streams effectively. > The current round-robin scheme seems to be optimized > for linear raid0? Not at all - sequential writes of large files are optimised to maintain high sequential *read* rates of the data that is being written. Also, RAID 0 and RAID 6 have exactly the same characteristics for this workload, so the behaviour you are seeing is more likely due to XFS is writing to slower areas of the disks when more streams are running in more AGs. i.e. 900MB/s might be what you get at the outer edge of the disks, but you might only get 500MB/s at the inner edges. When writing into 4 AGs at once, they are not all going to the outer edge, and hence you see a much truer reflection of the speed of your storage than the single AG case. Keep in mind the inode64 AG selection algorithm is optimised to spread the allocation load out over the entire filesystem address space via rotating the directory structure. It does this to increases allocation parallelism and reduce filesystem hotspots, to improves individual locality of disparate sets of data, and in general is significantly faster than any other AG selection algorithm that anyone has managed to come up with. > With small AGs one could also argue, that choosing > AGs which are not far away from each other (in respect to the number > of blocks) also adds more parallel disk access for small and medium > sized files. > > Any objections against a patch to improve the AG selection? Define "improve". I'm interested in hearing new idea on how we might be able to make different allocation decisions, but changing algorithms is not just a matter of changing code. At minimum, changing the way allocation is done will drastically change the aging characteristics of the filesystem, and so what might work really well for empty filesystems (like ext4's linear allocation algorithms) really hurts performance as filesystems get older and free space gets less contiguous.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs