Re: [PATCH v2 5/5] btrfs: fiemap: return extent physical size

Andreas Dilger <adilger@xxxxxxxxx> · Tue, 9 Apr 2024 13:31:18 -0600

On Apr 9, 2024, at 12:52 PM, David Sterba <dsterba@xxxxxxx> wrote:
> 
> On Wed, Apr 03, 2024 at 05:49:42PM +1030, Qu Wenruo wrote:
>> 
>> 
>> 在 2024/4/3 16:32, Sweet Tea Dorminy 写道:
>>>>> This means, we will emit a entry that uses the end to the physical
>>>>> extent end.
>>>>> 
>>>>> Considering a file layout like this:
>>>>> 
>>>>>      item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>>>>>          generation 7 type 1 (regular)
>>>>>          extent data disk byte 13631488 nr 65536
>>>>>          extent data offset 0 nr 4096 ram 65536
>>>>>          extent compression 0 (none)
>>>>>      item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53
>>>>>          generation 8 type 1 (regular)
>>>>>          extent data disk byte 13697024 nr 4096
>>>>>          extent data offset 0 nr 4096 ram 4096
>>>>>          extent compression 0 (none)
>>>>>      item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53
>>>>>          generation 7 type 1 (regular)
>>>>>          extent data disk byte 13631488 nr 65536
>>>>>          extent data offset 8192 nr 57344 ram 65536
>>>>>          extent compression 0 (none)
>>>>> 
>>>>> For fiemap, we would got something like this:
>>>>> 
>>>>> fileoff 0, logical len 4k, phy 13631488, phy len 64K
>>>>> fileoff 4k, logical len 4k, phy 13697024, phy len 4k
>>>>> fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k
>>>>> 
>>>>> [HOW TO CALCULATE WASTED SPACE IN USER SPACE]
>>>>> My concern is on the first entry. It indicates that we have wasted
>>>>> 60K (phy len is 64K, while logical len is only 4K)
>>>>> 
>>>>> But that information is not correct, as in reality we only wasted 4K,
>>>>> the remaining 56K is still referred by file range [8K, 64K).
>>>>> 
>>>>> Do you mean that user space program should maintain a mapping of each
>>>>> utilized physical range, and when handling the reported file range
>>>>> [8K, 64K), the user space program should find that the physical range
>>>>> covers with one existing extent, and do calculation correctly?
>>>> 
>>>> My goal is to give an unprivileged interface for tools like compsize
>>>> to figure out how much space is used by a particular set of files.
>>>> They report the total disk space referenced by the provided list of
>>>> files, currently by doing a tree search (CAP_SYS_ADMIN) for all the
>>>> extents pertaining to the requested files and deduplicating extents
>>>> based on disk_bytenr.
>>>> 
>>>> It seems simplest to me for userspace for the kernel to emit the
>>>> entire extent for each part of it referenced in a file, and let
>>>> userspace deal with deduplicating extents. This is also most similar
>>>> to the existing tree-search based interface. Reporting whole extents
>>>> gives more flexibility for userspace to figure out how to report
>>>> bookend extents, or shared extents, or ...
>>>> 
>>>> It does seem a little weird where if you request with fiemap only e.g.
>>>> 4k-16k range in that example file you'll get reported all 68k
>>>> involved, but I can't figure out a way to fix that without having the
>>>> kernel keep track of used parts of the extents as part of reporting,
>>>> which sounds expensive.
>>>> 
>>>> You're right that I'm being inconsistent, taking off extent_offset
>>>> from the reported disk size when that isn't what I should be doing, so
>>>> I fixed that in v3.
>>> 
>>> Ah, I think I grasp a point I'd missed before.
>>> - Without setting disk_bytenr to the actual start of the data on disk,
>>> there's no way to find the location of the actual data on disk within
>>> the extent from fiemap alone
>> 
>> Yes, that's my point.
>> 
>>> - But reporting disk_bytenr + offset, to get actual start of data on
>>> disk, means we need to report a physical size to figure out the end of
>>> the extent and we can't know the beginning.
>> 
>> disk_bytenr + offset + disk_num_bytes, and with the existing things like
>> length (aka, num_bytes), filepos (aka, key.offset) flags
>> (compression/hole/preallocated etc), we have everything we need to know
>> for regular extents.
>> 
>> For compressed extents, we also need ram_bytes.
>> 
>> If you ask me, I'd say put all the extra members into fiemap entry if we
>> have the space...
>> 
>> It would be u64 * 4 if we go 1:1 on the file extent items, otherwise we
>> may cheap on offset and ram_bytes (u32 is enough for btrfs at least), in
>> that case it would be u64 * 2 + u32 * 2.
>> 
>> But I'm also 100% sure, the extra members would not be welcomed by other
>> filesystems either.
> 
> That's probably right, too many btrfs-specific information in the
> generic FIEMAP, but we may also do our own enhanced fiemap ioctl that
> would provide all the information you suggest and we'd be free to put
> the compression information there too.

I read this thread when it was first posted, but I don't understand what
these extra fields actually mean?  Definitely adding the logical/physical
length makes sense for compressed extents, but I didn't see any clear
explanation of what these other fields actually mean?

I'm extrapolating something like btrfs has aggregated compressed chunks
that have multiple independent/disjoint blocks within a chunk, and you
are trying to get the exact offset within the compression byte stream
for the start of each block in the chunk?

Cheers, Andreas

Attachment:
signature.asc

Description: Message signed with OpenPGP