On May 25, 2008 15:42 -0400, Christoph Hellwig wrote: > On Sat, May 24, 2008 at 05:01:48PM -0700, Mark Fasheh wrote: > > * FIEMAP_FLAG_HSM_READ > > If the extent is offline, retrieve it before mapping and do not flag > > it as FIEMAP_EXTENT_SECONDARY. This flag has no effect if the file > > system does not support HSM. > > Given that there's no HSM support in mainline this should not be added. > It'll be useful once we add proper HSM support, though :) This was added at the request of David for XFS, because the XFS bmap ioctl defaults to reading in extents from HSM. I don't have any attachment to it myself. > > * FIEMAP_FLAG_LUN_ORDER > > If the file system stripes file data, this will return contiguous > > regions of physical allocation, sorted by LUN. Logical offsets may not > > make sense if this flag is passed. If the file system does not support > > multiple LUNs, this flag will be ignored. > > A LUN doesn't make any sense in filesystem context. That's a > scsi-centric acronym that doesn't even make sense in a scsi-centric > filesystem universe because a LUN can of course contain multiple > partitions. It's also extremly ill-defined when using volume managers. What else do you propose calling this? It isn't a LUN in the SCSI sense of course, but there is definitely a need to be able to identify multiple disks. Regardless of whether there is a single disk or multiple disks involved, it is generally called a LUN. It is a better than calling it a "disk" or a "partition". > There's also no filesystems that actually support a single file on > multiple device in mainline, the only filesystem that supports multiple > data devices at all (XFS) requires each file to be on a single device. > > Once we have a filesystem with real multiple data device support like > btrfs or a future XFS version we can worry about this and defined > a different ioctl for it. I don't see why we need a different ioctl for mapping extents on a filesystem that support direct access to multiple disks. Having one mechanism that returns the file mapping is much more simple for user space applications (filefrag, cp, tar, gzip, etc) than having to use different ioctls for different backing filesystems. > > Each extent is described by a single fiemap_extent structure as > > returned in fm_extents. > > > > struct fiemap_extent { > > __u64 fe_logical;/* logical offset in bytes for the start of > > * the extent */ > > __u64 fe_physical; /* physical offset in bytes for the start > > * of the extent */ > > __u64 fe_length; /* length in bytes for the extent */ > > __u32 fe_flags; /* returned FIEMAP_EXTENT_* flags for the extent */ > > __u32 fe_lun; /* logical device number for extent (starting at 0)*/ > > Again this lun thing is horribly ill-defined. There is no such thing > as a logic device number in our filesystem terminology. Propose a better name then, but the need for it will not go away. This is needed for Lustre, btrfs, pNFS, etc. The whole point of developing this API and getting input from all of the main filesystems was to have a single common interface that could be used by all filesystems. > > struct fiemap_extent_info { > > unsigned int fi_flags; /* Flags as passed from user */ > > unsigned int fi_extents_mapped; /* Number of mapped extents */ > > unsigned int fi_extents_max; /* Size of fiemap_extent array */ > > char *fi_extents_start; /* Start of fiemap_extent array */ > > }; > > Why is this passes a structure instead of individual arguments? Saves on passing this around as arguments on the stack? Also, for ext4 there is an iterator function which needs a private data struct passed, and it doesn't make sense to require duplicating all of this information again. > Also why isn't fi_extents_start properly typed? I was wondering about that, I'm not sure why Mark implemented it that way. I would have thought that it should be a struct fiemap_extent *. I thought maybe to allow for misaligned userspace pointers, but I'm not sure. > > If the request has the FIEMAP_FLAG_NUM_EXTENTS flag set, then calling > > this helper is not necessary and fi_extents_mapped can be set > > directly. > > Sounds like the count number of extents request should be a separate > ioctl and separate filesystem entry point instead of overloading FIEMAP. I don't see that at all. The operations that the filesystem has to do are basically the same whether it is counting extents or returning them. All that would result from having separate ioctl and filesystem methods would be a lot of code duplication. The fiemap_fill_next_extents() call will handle the NUM_EXTENTS operation internally, and the filesystem code doesn't need to special case this at all. The only time the NUM_EXTENTS case would be handled by the filesystem specially would be if it tracks the count of extents itself for some reason. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html