Re: generic/304 doesn't seem terminable for cifs

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Sat, Dec 02, 2023 at 10:14:20PM +0000, David Howells wrote:
> 
> I've been running "-g quick" on a CIFS mount and it got to generic/304... and
> is still there nearly 30 hours later.  The kernel isn't stuck - userspace is
> cranking out BTRFS_IOC_FILE_EXTENT_SAME or FIDEDUPERANGE at a great rate.
> 
> The ps tree looks like:
> 
> S+     0:03       \_ /bin/bash ./check -E .exclude -g quick
> S+     0:00           \_ /bin/bash /root/xfstests-dev/tests/generic/304
> S+     0:00               \_ /bin/bash /root/xfstests-dev/tests/generic/304
> Dl+  316:55               |   \_ /usr/sbin/xfs_io -i -f -c dedupe /xfstest.test/test-304/file0 0 0 9223372036854775807 /xfstest.test/test-

$ printf "0x%x\n" 9223372036854775807
0x7fffffffffffffff
$

So this is asking for dedupe of the entire empty file. The files
are both sparse, so dedupe should be instantenous because there is
nothing to dedupe.


> S+     0:00               \_ /bin/bash /root/xfstests-dev/tests/generic/304
> S+     0:00                   \_ sed -e s/^dedupe:/XFS_IOC_FILE_EXTENT_SAME:/g
> 
> The xfs_io command strace is an endlessly repeated:
> 
> ioctl(4, BTRFS_IOC_FILE_EXTENT_SAME or FIDEDUPERANGE, {src_offset=389438053887770624, src_length=8833933982967005183, dest_count=1, info=[{dest_fd=3, dest_offset=389438053887770624}]} => {info=[{bytes_deduped=1073741824, status=0}]}) = 0
> ioctl(4, BTRFS_IOC_FILE_EXTENT_SAME or FIDEDUPERANGE, {src_offset=389438054961512448, src_length=8833933981893263359, dest_count=1, info=[{dest_fd=3, dest_offset=389438054961512448}]} => {info=[{bytes_deduped=1073741824, status=0}]}) = 0
> ioctl(4, BTRFS_IOC_FILE_EXTENT_SAME or FIDEDUPERANGE, {src_offset=389438056035254272, src_length=8833933980819521535, dest_count=1, info=[{dest_fd=3, dest_offset=389438056035254272}]} => {info=[{bytes_deduped=1073741824, status=0}]}) = 0
> ioctl(4, BTRFS_IOC_FILE_EXTENT_SAME or FIDEDUPERANGE, {src_offset=389438057108996096, src_length=8833933979745779711, dest_count=1, info=[{dest_fd=3, dest_offset=389438057108996096}]}^Cstrace: Process 105030 detache

And this indicates the src_offset is going up by 1GB and the
src_length is going down by 1GB on each pass.

So what is happening here is that either the CIFS client or the
server is only able to process dedupe requests in 1GB chunks, yet
the files are 8EB in size. That's only 8.5 billion round trips to
the server - it should be done in a few thousand years.

So, nothing apparently wrong with the test of xfs_io - it's the
filesystem dedupe max range limits that appear to be the issue.

> 
> with the dest_offset increasing a bit each time.  The log so far is:
> 
> wrote 1/1 bytes at offset 9223372036854775806
> 1.000000 bytes, 1 ops; 0.0000 sec (97.656 KiB/sec and 100000.0000 ops/sec)
> wrote 1/1 bytes at offset 9223372036854775806
> 1.000000 bytes, 1 ops; 0.0000 sec (97.656 KiB/sec and 100000.0000 ops/sec)
> wrote 1/1 bytes at offset 1048575
> 1.000000 bytes, 1 ops; 0.0000 sec (139.509 KiB/sec and 142857.1429 ops/sec)
> 
> Looking in the protocol dump, it's endlessly repeating:
> 
>   190 0.007930488  192.168.6.2 → 192.168.6.1  SMB2 174 SetInfo Request FILE_INFO/SMB2_FILE_ENDOFFILE_INFO
>   191 0.007962785  192.168.6.1 → 192.168.6.2  SMB2 136 SetInfo Response
>   192 0.007974644  192.168.6.2 → 192.168.6.1  SMB2 230 Ioctl Request FILE_SYSTEM Function:0x00d1
>   193 0.008069283  192.168.6.1 → 192.168.6.2  SMB2 182 Ioctl Response FILE_SYSTEM Function:0x00d1

Huh. I think CIFS is complete broken w.r.t. dedupe requests.

The client passes the ->remap_file_range() call to the server via
->duplicate_extents() to execute, but it does not pass any of the
remap flags to the server. One of those remap flags is
REMAP_FILE_DEDUPE, and that tells the filesysetm that it is a dedupe
operation, not a clone operation.

The CIFS client implements this callout in smb2_duplicate_extents(),
which translates to a FSCTL_DUPLICATE_EXTENTS_TO_FILE smb2
operation. The server side takes this operation and calls
vfs_clone_file_range() and/or vfs_copy_file_range(), which means
that the server can only execute a clone operation via this protocol
request. Hence it executes a clone/copy operation on reciept rather
than a dedupe operation and hence is potentially destroying the data
in the destination file.

Yeah, that's bad, but the server is only doing what the client asked
it to do.

Really, this looks like a CIFS client bug - it should not be
advertising support for FIDEDUPERANGE to applications. i.e. the smb2
protocol implementation only appears to support FICLONERANGE
semantics and so the client should be returning -EOPNOTSUPP when
REMAP_FILE_DEDUPE is set in ->remap_file_range().

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx




[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux