On Jun 05, 2008 01:22 -0400, Martin K. Petersen wrote: > This is just a proof of concept set of patches. I'd like some > feedback before I spend more time on them. > > At the Filesystem & Storage Workshop there was lots of discussion > about how to communicate I/O alignment, stripe width, etc. to the > filesystems so they could lay out things properly. Thanks for looking into this Martin. We could use this pretty immediately in ext4 for helping the block allocator make good decisions. > An addition to the up-and-coming version of the SCSI block protocol > features an inquiry page that hardware RAIDs can use to indicate > preferred I/O sizes for a given LUN. > > This patch kit implements support for exporting those values in > /sys/block/. I have implemented support for it in sd.c using the > Block Limits VPD and in MD using chunk size and stripe width. > > The physical sector offset for the start of the "virtual" block device > is also exported. This includes partitions so you can get the actual > physical start sector offset for - say - an MD device sitting on a > partitioned set of drives. The kernel part of the code seems pretty reasonable (nicely stackable, as your MD examples show) and useful for filesystems. Having this information available in the kernel removes much of the need to find this information in userspace, but unfortunately not all of it (e.g. some mkfs-time layout decisions need to be done before the filesystem is mounted, even if the allocator can use the kernel-supplied hits). To be honest, however, having the information exported only via sysfs is a bit ugly IMHO. I've had all sorts of grief with settings there because there isn't always a match between the device that is being specified by the user and what appears in sysfs (e.g. /dev/disk/by-id/foo doesn't match /sys/block/sda) and hoops have to be jumped through to find this mapping, before parsing a text value in C. Having an ioctl() that can be called on the block device (getting the right device regardless of its name) seems a lot more useful to applications in my experience, unless you are using a script. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html