Re: [PATCH v2 5/5] btrfs: fiemap: return extent physical size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 3/31/24 05:03, Qu Wenruo wrote:


在 2024/3/28 11:52, Sweet Tea Dorminy 写道:
Now that fiemap allows returning extent physical size, make btrfs return
the appropriate extent's actual disk size.

Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@xxxxxxxxxx>
[...]
@@ -3221,7 +3239,9 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo,
              ret = emit_fiemap_extent(fieinfo, &cache, key.offset,
                           disk_bytenr + extent_offset,
-                         extent_len, flags);
+                         extent_len,
+                         disk_size - extent_offset,

This means, we will emit a entry that uses the end to the physical extent end.

Considering a file layout like this:

     item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
         generation 7 type 1 (regular)
         extent data disk byte 13631488 nr 65536
         extent data offset 0 nr 4096 ram 65536
         extent compression 0 (none)
     item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53
         generation 8 type 1 (regular)
         extent data disk byte 13697024 nr 4096
         extent data offset 0 nr 4096 ram 4096
         extent compression 0 (none)
     item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53
         generation 7 type 1 (regular)
         extent data disk byte 13631488 nr 65536
         extent data offset 8192 nr 57344 ram 65536
         extent compression 0 (none)

For fiemap, we would got something like this:

fileoff 0, logical len 4k, phy 13631488, phy len 64K
fileoff 4k, logical len 4k, phy 13697024, phy len 4k
fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k

[HOW TO CALCULATE WASTED SPACE IN USER SPACE]
My concern is on the first entry. It indicates that we have wasted 60K (phy len is 64K, while logical len is only 4K)

But that information is not correct, as in reality we only wasted 4K, the remaining 56K is still referred by file range [8K, 64K).

Do you mean that user space program should maintain a mapping of each utilized physical range, and when handling the reported file range [8K, 64K), the user space program should find that the physical range covers with one existing extent, and do calculation correctly?

My goal is to give an unprivileged interface for tools like compsize to figure out how much space is used by a particular set of files. They report the total disk space referenced by the provided list of files, currently by doing a tree search (CAP_SYS_ADMIN) for all the extents pertaining to the requested files and deduplicating extents based on disk_bytenr.

It seems simplest to me for userspace for the kernel to emit the entire extent for each part of it referenced in a file, and let userspace deal with deduplicating extents. This is also most similar to the existing tree-search based interface. Reporting whole extents gives more flexibility for userspace to figure out how to report bookend extents, or shared extents, or ...

It does seem a little weird where if you request with fiemap only e.g. 4k-16k range in that example file you'll get reported all 68k involved, but I can't figure out a way to fix that without having the kernel keep track of used parts of the extents as part of reporting, which sounds expensive.

You're right that I'm being inconsistent, taking off extent_offset from the reported disk size when that isn't what I should be doing, so I fixed that in v3.


[COMPRESSION REPRESENTATION]
The biggest problem other than the complexity in user space is the handling of compressed extents.

Should we return the physical bytenr (disk_bytenr of file extent item) directly or with the extent offset added? Either way it doesn't look consistent to me, compared to non-compressed extents.


As I understand it, the goal of reporting physical bytenr is to provide a number which we could theoretically then resolve into a disk location or few if we cared, but which doesn't necessarily have any physical meaning. To quote the fiemap documentation page: "It is always undefined to try to update the data in-place by writing to the indicated location without the assistance of the filesystem". So I think I'd prefer to always report the entire size of the entire extent being referenced.

[ALTERNATIVE FORMAT]
The other alternative would be following the btrfs ondisk format, providing a unique physical bytenr for any file extent, then the offset/referred length inside the uncompressed extent.

That would handle compressed and regular extents more consistent, and a little easier for user space tool to handle (really just a tiny bit easier, no range overlap check needed), but more complex to represent, and I'm not sure if any other filesystem would be happy to accept the extra members they don't care.

I really want to make sure that this interface reports the unused space in e.g bookend extents well -- compsize has been an important tool for me in this respect, e.g. a time when a 10g file was taking up 110g of actual disk space. If we report the entire length of the entire extent, then when used on whole files one can establish the space referenced by that file but not used; similarly on multiple files. So while I like the simplicity of just reporting the used length, I don't think there's a way to make compsize unprivileged with that approach.

Thank you!!




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux