On Apr 9, 2024, at 12:52 PM, David Sterba <dsterba@xxxxxxx> wrote: > > On Wed, Apr 03, 2024 at 05:49:42PM +1030, Qu Wenruo wrote: >> >> >> 在 2024/4/3 16:32, Sweet Tea Dorminy 写道: >>>>> This means, we will emit a entry that uses the end to the physical >>>>> extent end. >>>>> >>>>> Considering a file layout like this: >>>>> >>>>> item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53 >>>>> generation 7 type 1 (regular) >>>>> extent data disk byte 13631488 nr 65536 >>>>> extent data offset 0 nr 4096 ram 65536 >>>>> extent compression 0 (none) >>>>> item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53 >>>>> generation 8 type 1 (regular) >>>>> extent data disk byte 13697024 nr 4096 >>>>> extent data offset 0 nr 4096 ram 4096 >>>>> extent compression 0 (none) >>>>> item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53 >>>>> generation 7 type 1 (regular) >>>>> extent data disk byte 13631488 nr 65536 >>>>> extent data offset 8192 nr 57344 ram 65536 >>>>> extent compression 0 (none) >>>>> >>>>> For fiemap, we would got something like this: >>>>> >>>>> fileoff 0, logical len 4k, phy 13631488, phy len 64K >>>>> fileoff 4k, logical len 4k, phy 13697024, phy len 4k >>>>> fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k >>>>> >>>>> [HOW TO CALCULATE WASTED SPACE IN USER SPACE] >>>>> My concern is on the first entry. It indicates that we have wasted >>>>> 60K (phy len is 64K, while logical len is only 4K) >>>>> >>>>> But that information is not correct, as in reality we only wasted 4K, >>>>> the remaining 56K is still referred by file range [8K, 64K). >>>>> >>>>> Do you mean that user space program should maintain a mapping of each >>>>> utilized physical range, and when handling the reported file range >>>>> [8K, 64K), the user space program should find that the physical range >>>>> covers with one existing extent, and do calculation correctly? >>>> >>>> My goal is to give an unprivileged interface for tools like compsize >>>> to figure out how much space is used by a particular set of files. >>>> They report the total disk space referenced by the provided list of >>>> files, currently by doing a tree search (CAP_SYS_ADMIN) for all the >>>> extents pertaining to the requested files and deduplicating extents >>>> based on disk_bytenr. >>>> >>>> It seems simplest to me for userspace for the kernel to emit the >>>> entire extent for each part of it referenced in a file, and let >>>> userspace deal with deduplicating extents. This is also most similar >>>> to the existing tree-search based interface. Reporting whole extents >>>> gives more flexibility for userspace to figure out how to report >>>> bookend extents, or shared extents, or ... >>>> >>>> It does seem a little weird where if you request with fiemap only e.g. >>>> 4k-16k range in that example file you'll get reported all 68k >>>> involved, but I can't figure out a way to fix that without having the >>>> kernel keep track of used parts of the extents as part of reporting, >>>> which sounds expensive. >>>> >>>> You're right that I'm being inconsistent, taking off extent_offset >>>> from the reported disk size when that isn't what I should be doing, so >>>> I fixed that in v3. >>> >>> Ah, I think I grasp a point I'd missed before. >>> - Without setting disk_bytenr to the actual start of the data on disk, >>> there's no way to find the location of the actual data on disk within >>> the extent from fiemap alone >> >> Yes, that's my point. >> >>> - But reporting disk_bytenr + offset, to get actual start of data on >>> disk, means we need to report a physical size to figure out the end of >>> the extent and we can't know the beginning. >> >> disk_bytenr + offset + disk_num_bytes, and with the existing things like >> length (aka, num_bytes), filepos (aka, key.offset) flags >> (compression/hole/preallocated etc), we have everything we need to know >> for regular extents. >> >> For compressed extents, we also need ram_bytes. >> >> If you ask me, I'd say put all the extra members into fiemap entry if we >> have the space... >> >> It would be u64 * 4 if we go 1:1 on the file extent items, otherwise we >> may cheap on offset and ram_bytes (u32 is enough for btrfs at least), in >> that case it would be u64 * 2 + u32 * 2. >> >> But I'm also 100% sure, the extra members would not be welcomed by other >> filesystems either. > > That's probably right, too many btrfs-specific information in the > generic FIEMAP, but we may also do our own enhanced fiemap ioctl that > would provide all the information you suggest and we'd be free to put > the compression information there too. I read this thread when it was first posted, but I don't understand what these extra fields actually mean? Definitely adding the logical/physical length makes sense for compressed extents, but I didn't see any clear explanation of what these other fields actually mean? I'm extrapolating something like btrfs has aggregated compressed chunks that have multiple independent/disjoint blocks within a chunk, and you are trying to get the exact offset within the compression byte stream for the start of each block in the chunk? Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP