On Thu, May 30, 2019 at 12:02 PM Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote: > > Hi everyone, > > Murphy Zhou sent a patch to generic/517 in fstests to fix a dedupe > failure he was seeing on NFS: > > On Thu, May 30, 2019 at 05:41:47PM +0800, Murphy Zhou wrote: > > NFSv4.2 could pass _require_scratch_dedupe, since the test offset and > > size are aligned, while generic/517 is performing unaligned dedupe. > > NFS does not support unaligned dedupe now, returns EINVAL. > > > > Signed-off-by: Murphy Zhou <xzhou@xxxxxxxxxx> > > --- > > tests/generic/517 | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/tests/generic/517 b/tests/generic/517 > > index 601bb24e..23665782 100755 > > --- a/tests/generic/517 > > +++ b/tests/generic/517 > > @@ -30,6 +30,7 @@ _cleanup() > > _supported_fs generic > > _supported_os Linux > > _require_scratch_dedupe > > +$FSTYP == "nfs" && _notrun "NFS can't handle unaligned deduplication" > > I was surprised to see a dedupe fix for NFS since (at least to my > knowledge) neither of these two network filesystems actually support > server-side deduplication commands, and therefore the > _require_scratch_dedupe should have _notrun the test. > > Then I looked at fs/nfs/nfs4file.c: > > static loff_t nfs42_remap_file_range(struct file *src_file, loff_t src_off, > struct file *dst_file, loff_t dst_off, loff_t count, > unsigned int remap_flags) > { > <local variable declarations> > > if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY)) > return -EINVAL; > > <check alignment, lock inodes, flush pending writes> > > ret = nfs42_proc_clone(src_file, dst_file, src_off, dst_off, count); > > The NFS client code will accept REMAP_FILE_DEDUP through remap_flags, > which is how dedupe requests are sent to filesystems nowadays. The nfs > client code does not itself compare the file contents, but it does issue > a CLONE command to the NFS server. The end result, AFAICT, is that a > user program can write 'A's to file1, 'B's to file2, issue a dedup > ioctl to the kernel, and have a block of 'B's mapped into file1. That's > broken behavior, according to the dedup ioctl manpage. > > Notice how remap_flags is checked but is not included in the > nfs42_proc_clone call? That's how I conclude that the NFS client cannot > possibly be sending the dedup request to the server. > > The same goes for fs/cifs/cifsfs.c: > > static loff_t cifs_remap_file_range(struct file *src_file, loff_t off, > struct file *dst_file, loff_t destoff, loff_t len, > unsigned int remap_flags) > { > <local variable declarations> > > if (remap_flags & ~(REMAP_FILE_DEDUP | REMAP_FILE_ADVISORY)) > return -EINVAL; > > <check files, lock inodes, flush pages> > > if (target_tcon->ses->server->ops->duplicate_extents) > rc = target_tcon->ses->server->ops->duplicate_extents(xid, > smb_file_src, smb_file_target, off, len, destoff); > else > rc = -EOPNOTSUPP; > > Again, remap_flags is checked here but it has no influence over the > ->duplicate_extents call. > > Next I got to thinking that when I reworked the clone/dedupe code last > year, I didn't include REMAP_FILE_DEDUP support for cifs or nfs, because > as far as I knew, neither protocol supports a verb for deduplication. > The remap_flags checks were modified to allow REMAP_FILE_DEDUP in > commits ce96e888fe48e (NFS) and b073a08016a10 (CIFS) with this > justification (the cifs commit has a similar message): > > "Subject: Fix nfs4.2 return -EINVAL when do dedupe operation > > "dedupe_file_range operations is combiled into remap_file_range. > " But in nfs42_remap_file_range, it's skiped for dedupe operations. > " Before this patch: > " # dd if=/dev/zero of=nfs/file bs=1M count=1 > " # xfs_io -c "dedupe nfs/file 4k 64k 4k" nfs/file > " XFS_IOC_FILE_EXTENT_SAME: Invalid argument > " After this patch: > " # dd if=/dev/zero of=nfs/file bs=1M count=1 > " # xfs_io -c "dedupe nfs/file 4k 64k 4k" nfs/file > " deduped 4096/4096 bytes at offset 65536 > " 4 KiB, 1 ops; 0.0046 sec (865.988 KiB/sec and 216.4971 ops/sec)" > > This sort of looks like monkeypatching to make an error message go away. > One could argue that this ought to return EOPNOSUPP instead of EINVAL, > and maybe that's what should've happened. > > So, uh, do NFS and CIFS both support server-side dedupe now, or are > these patches just plain wrong? > > No, they're just wrong, because I can corrupt files like so on NFS: > > $ rm -rf urk moo > $ xfs_io -f -c "pwrite -S 0x58 0 31048" urk > wrote 31048/31048 bytes at offset 0 > 30 KiB, 8 ops; 0.0000 sec (569.417 MiB/sec and 153846.1538 ops/sec) > $ xfs_io -f -c "pwrite -S 0x59 0 31048" moo > wrote 31048/31048 bytes at offset 0 > 30 KiB, 8 ops; 0.0001 sec (177.303 MiB/sec and 47904.1916 ops/sec) > $ md5sum urk moo > 37d3713e5f9c4fe0f8a1f813b27cb284 urk > a5b6f953f27aa17e42450ff4674fa2df moo > $ xfs_io -c "dedupe urk 0 0 4096" moo > deduped 4096/4096 bytes at offset 0 > 4 KiB, 1 ops; 0.0012 sec (3.054 MiB/sec and 781.8608 ops/sec) > $ md5sum urk moo > 37d3713e5f9c4fe0f8a1f813b27cb284 urk > 2c992d70131c489da954f1d96d8c456e moo > > (Not sure about cifs, since I don't have a Windows Server handy) > > I'm not an expert in CIFS or NFS, so I'm asking: do either support > dedupe or is this a kernel bug? NFS does not support dedupe and only supports cloning (whole) files.