Andreas Dilger wrote:
So, I think we need another __u64 in he fiemap_extent which is fe_loglength, and rename fe_length to fe_physlength.
As some people guessed, my earlier post [PATCH 0/5]: > experience with a non-linux filesystem that has a similar API refers to AdvFS on Tru64. I was asked to provide more information about Tru64's equivalent to fiemap. I believe the person who asked wanted to get closure on the fiemap definition, but I'm probably going to just throw more gasoline around and light the match with this :) My earlier post said how I thought our code worked, and as usual if I describe something without looking at the code, when I really go look at it I find it does something else and I'm saying "damn, I didn't think it was that ugly". Well it probably is ugly and it is really 4 different interfaces, but after thinking about it I realized the 4 interface designs are KISS defensible as being optimal for their intended use. Here is the 10 year old "most used API" for userspace code: #define F_GETMAP 21 /* retrieve a file's sparseness map */ struct extentmapentry { unsigned long offset; unsigned long size; }; struct extentmap { unsigned long arraysize; unsigned long numextents; unsigned long offset; struct extentmapentry *extent; }; fcntl(fileno, F_GETMAP, &extentmap) Backup/dump tools call this fcntl() to retrieve the sparseness map of an AdvFS or UFS file. NFS and CD filesystems return an error. Its intent is to return the LOGICAL extent map of a file, without regard to the physical extent map of the file. Multiple extents will only be returned if the file in question is a sparse file. All logically contiguous extents will be collapsed into a single extent. FYI, "longs" are 64 bits on Tru64. The extentmapentry.offset is byte-in-file and extentmapentry.size is bytes-in-extent and only allocated data extents are returned so there is no need for "extent type". The extentmapentry is designed to be small so that minimum memory is required when the file is highly sparse-fragmented. extentmap.arraysize is really max_extents (in) and "how_many_more" (out) extents are present after the "numextents" (out) in the *extent output array. The part (so ugly) I forgot is that the extentmap.offset is NOT a "starting byte in file", it is a "skip over this many data extents". That is not an intuitive api but then I realized it is precisely the best for a backup program. The backup always reads the complete file from 0..filesize, it wants to duplicate sparse as sparse (or at least not read it from the disk), it needs to use a reasonably sized extent array, so it needs to walk forward in a loop (as in get 4 extents in one call (0..3, 4..7, 8..11). So extentmap.offset as an index into the file's logical map makes sense and you don't need to worry about start-at-byte-in-file not being an extent start. A program that wanted to optimize random reads to a sparse file could do it using this api though not as easily as if it had the start_byte input parameter. I'm not going to bore you with the other 3 interfaces that are only supported in AdvFS to retrieve RAW extent maps for the cluster and filesystem administrative tools. These are the only interfaces that return extent device allocation because normal applications including backup need to do their data access through the filesystem. The bottom line is that information that is filesystem-specific is only really valuable to tools that are filesystem-specific. === LIGHT THE MATCH === - I don't want linux to implement Tru64 F_GETMAP for fiemap! - The lesson is that a simple design covers the major use and other complicated needs are done somewhere else. - I have talked to Mark and he has tools waiting to use the features he originally designed into fiemap... but every day there is a new flag or return field added "just in case". - I know "memory is cheap", but we still seem to run out of it so expanding every return structure for data that may only be useful to a specific filesystem seems like a bad idea. - A simplified filesystem-independent version and separate complex-as-you-want filesystem-dependent api might be better, for example: * We can't even agree what "device" is. * What good is "encrypted" or "compressed" without "how"? === THROW ON MORE GASOLINE === Subject: [RFC] add FIEMAP ioctl to efficiently map file allocation From: Andreas Dilger <adilger () clusterfs ! com> Date: 2007-04-12 11:05:50 Message-ID: 20070412110550.GM5967 () schatzie ! adilger ! int we additionally need to get the mapping over the network so it needs to be efficient in terms of how data is passed, and how easily it can be extracted from the filesystem. jim -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html