On May 29, 2008 09:01 -0400, Christoph Hellwig wrote: > On Wed, May 28, 2008 at 10:09:31AM -0600, Andreas Dilger wrote: > > ... but I don't think it should necessarily be _required_ to return a > > real "dev_t" (major, minor) device. For network filesystems this is > > meaningless. If it is possible for FIEMAP_EXTENT_NET to signal that the > > device is not a local/physical device (where a dev_t has no meaning), > > and simply allow an enumeration [0, 1, 2, ...] of the logical devices > > then I think this is reasonable. The mapping of logical devices to > > servers is available separately with a Lustre-specific ioctl. > > > > This passes more information for filesystems that have local devices > > while not breaking the functionality for network filesystems and could > > be used as an efficient replacement for lilo's use of FIBMAP. > > A dev_t actually means something for the only in-tree users of > this interface, so there's no point making this interface worse for > some long-term out of tree code. And it's not like you simply can't > allow multiple anonymous blockdevices for your networked filesystems > similar to the one used for st_dev already. But requiring 1500 anonymous blockdevices (== number of storage targets) be created at mount time, which exporting some varying-over-reboot, and inconsistent-across-clients random-value dev_t for network filesystems just for the possibility that the client is going to do FIEMAP isn't making the interface better either... Getting devices of [0x1908afed, 0x4058204b] back from FIEMAP of a file on one client, and [0x4bac5821, 0x0abefd63] on another client is pretty useless compared to devices [2, 4], which have very clear meanings, will always be the same across all clients, and the same across reboots. > > For RAID1/10 you can return multiple logical->physical extent mappings > > for the same logical range of the file with different "device" IDs. You > > could do the same for RAID5 returning each of the data and parity chunks > > with "NO_DIRECT" if desired (maybe only on the parity extent, or don't > > return the parity extent at all). The spec does not require that the > > returned extents be non-overlapping. > > Umm, no. That's just make the interface too complicated. I can bet > with your that userspace programmers will generally only test their code > with simple filesystems and hell will break lose when they get these > multiple ranges. Especially as that's a very unnatural interface. The metadata information isn't exposed to callers by default, they have to request it explicitly with e.g. FIEMAP_FLAG_METADATA. For the most common use cases, applications/users will care about: a) for cp/tar/dd/etc they only want to know where there are holes. This is available in the most simple instance of FIEMAP (no flags). b) for "fiemap" the user will want to know whether there are large or small contiguous allocations/fragmentation, or just the extent count. c) for sophisticated users (e.g. filesystem developers, performance tuning) they want to know both the extent information, the metadata layout, and possibly the mapping all the way down to the platters Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html