Re: FIDEDUPERANGE claims to succeed for non-identical files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 4, 2023 at 6:25 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
>
> On Thu, Dec 22, 2022 at 03:41:30PM +0100, Zbigniew Halas wrote:
> > On Thu, Dec 22, 2022 at 9:25 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > >
> > > Thanks for the analysis.
> > > Would you be interested in trying to fix the bug and writing a test?
> > > I can help if you would like.
> >
> > I can give it a try unless it turns out that some deep VFS changes are
> > required, but let's try to narrow down the reasonable API semantics
> > first.
> >
> > > It's hard to follow all the changes since
> > > 54dbc1517237 ("vfs: hoist the btrfs deduplication ioctl to the vfs")
> > > in v4.5, but it *looks* like this behavior might have been in btrfs,
> > > before the ioctl was promoted to vfs.. not sure.
> > >
> > > We have fstests coverage for the "good" case of same size src/dst
> > > (generic/136), but I didn't find a test for the non-same size src/dst.
> > >
> > > In any case, vfs_dedupe_file_range_one() and ->remap_file_range()
> > > do not even have an interface to return the actual bytes_deduped,
>
> The number of bytes remapped is passed back via the loff_t return value
> of ->remap_file_range.  If CAN_SHORTEN is set, the VFS and the fs
> implementation are allowed to reduce the @len parameter as much as they
> want.  TBH I'm mystified why the original btrfs dedupe ioctl wouldn't
> allow deduplication of common prefixes, which means that len only gets
> shortened to avoid weird problems when dealing with eof being in the
> middle of a block.
>
> (Or not, since there's clearly a bug.)
>
> > > so I do not see how any of the REMAP_FILE_CAN_SHORTEN cases
> > > are valid, regardless of EOF.
> >
> > Not sure about this, it looks to me that they are actually returning
> > the number of bytes deduped, but the value is not used, but maybe I'm
> > missing something.
> > Anyway I think there are valid cases when REMAP_FILE_CAN_SHORTEN makes sense.
> > For example if a source file content is a prefix of a destination file
> > content and we want to dedup the whole range of the source file
> > without REMAP_FILE_CAN_SHORTEN,
> > then the ioctl will only succeed when the end of the source file is at
> > the block boundary, otherwise it will just fail. This will render the
> > API very inconsistent.
>
> <nod> I'll try to figure out where the len adjusting went wrong here and
> report back.

Hey Darrick,

Len adjustment happens in generic_remap_checks, specifically here:
    if (pos_in + count == size_in &&
        (!(remap_flags & REMAP_FILE_DEDUP) || pos_out + count == size_out)) {
        bcount = ALIGN(size_in, bs) - pos_in;
    } else {
        if (!IS_ALIGNED(count, bs))
            count = ALIGN_DOWN(count, bs);
        bcount = count;
    }
the problem is that in this particular case length is set to zero, and
it makes generic_remap_file_range_prep to succeed.
I think that 2 things are needed to make this API behave reasonably:
* always use the original length for the file content comparison, not
the truncated one (there currently is also a short circuit for zero
length that skips comparison, this would need to be revisited as well)
- otherwise we may report successful deduplication, despite the ranges
being clearly different and only the truncated ranges being the same.
* report real, possibly truncated deduplication length to the userland

Cheers,
Zbigniew



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux