On Sat, Dec 02, 2023 at 10:14:20PM +0000, David Howells wrote: > > I've been running "-g quick" on a CIFS mount and it got to generic/304... and > is still there nearly 30 hours later. The kernel isn't stuck - userspace is > cranking out BTRFS_IOC_FILE_EXTENT_SAME or FIDEDUPERANGE at a great rate. > > The ps tree looks like: > > S+ 0:03 \_ /bin/bash ./check -E .exclude -g quick > S+ 0:00 \_ /bin/bash /root/xfstests-dev/tests/generic/304 > S+ 0:00 \_ /bin/bash /root/xfstests-dev/tests/generic/304 > Dl+ 316:55 | \_ /usr/sbin/xfs_io -i -f -c dedupe /xfstest.test/test-304/file0 0 0 9223372036854775807 /xfstest.test/test- $ printf "0x%x\n" 9223372036854775807 0x7fffffffffffffff $ So this is asking for dedupe of the entire empty file. The files are both sparse, so dedupe should be instantenous because there is nothing to dedupe. > S+ 0:00 \_ /bin/bash /root/xfstests-dev/tests/generic/304 > S+ 0:00 \_ sed -e s/^dedupe:/XFS_IOC_FILE_EXTENT_SAME:/g > > The xfs_io command strace is an endlessly repeated: > > ioctl(4, BTRFS_IOC_FILE_EXTENT_SAME or FIDEDUPERANGE, {src_offset=389438053887770624, src_length=8833933982967005183, dest_count=1, info=[{dest_fd=3, dest_offset=389438053887770624}]} => {info=[{bytes_deduped=1073741824, status=0}]}) = 0 > ioctl(4, BTRFS_IOC_FILE_EXTENT_SAME or FIDEDUPERANGE, {src_offset=389438054961512448, src_length=8833933981893263359, dest_count=1, info=[{dest_fd=3, dest_offset=389438054961512448}]} => {info=[{bytes_deduped=1073741824, status=0}]}) = 0 > ioctl(4, BTRFS_IOC_FILE_EXTENT_SAME or FIDEDUPERANGE, {src_offset=389438056035254272, src_length=8833933980819521535, dest_count=1, info=[{dest_fd=3, dest_offset=389438056035254272}]} => {info=[{bytes_deduped=1073741824, status=0}]}) = 0 > ioctl(4, BTRFS_IOC_FILE_EXTENT_SAME or FIDEDUPERANGE, {src_offset=389438057108996096, src_length=8833933979745779711, dest_count=1, info=[{dest_fd=3, dest_offset=389438057108996096}]}^Cstrace: Process 105030 detache And this indicates the src_offset is going up by 1GB and the src_length is going down by 1GB on each pass. So what is happening here is that either the CIFS client or the server is only able to process dedupe requests in 1GB chunks, yet the files are 8EB in size. That's only 8.5 billion round trips to the server - it should be done in a few thousand years. So, nothing apparently wrong with the test of xfs_io - it's the filesystem dedupe max range limits that appear to be the issue. > > with the dest_offset increasing a bit each time. The log so far is: > > wrote 1/1 bytes at offset 9223372036854775806 > 1.000000 bytes, 1 ops; 0.0000 sec (97.656 KiB/sec and 100000.0000 ops/sec) > wrote 1/1 bytes at offset 9223372036854775806 > 1.000000 bytes, 1 ops; 0.0000 sec (97.656 KiB/sec and 100000.0000 ops/sec) > wrote 1/1 bytes at offset 1048575 > 1.000000 bytes, 1 ops; 0.0000 sec (139.509 KiB/sec and 142857.1429 ops/sec) > > Looking in the protocol dump, it's endlessly repeating: > > 190 0.007930488 192.168.6.2 → 192.168.6.1 SMB2 174 SetInfo Request FILE_INFO/SMB2_FILE_ENDOFFILE_INFO > 191 0.007962785 192.168.6.1 → 192.168.6.2 SMB2 136 SetInfo Response > 192 0.007974644 192.168.6.2 → 192.168.6.1 SMB2 230 Ioctl Request FILE_SYSTEM Function:0x00d1 > 193 0.008069283 192.168.6.1 → 192.168.6.2 SMB2 182 Ioctl Response FILE_SYSTEM Function:0x00d1 Huh. I think CIFS is complete broken w.r.t. dedupe requests. The client passes the ->remap_file_range() call to the server via ->duplicate_extents() to execute, but it does not pass any of the remap flags to the server. One of those remap flags is REMAP_FILE_DEDUPE, and that tells the filesysetm that it is a dedupe operation, not a clone operation. The CIFS client implements this callout in smb2_duplicate_extents(), which translates to a FSCTL_DUPLICATE_EXTENTS_TO_FILE smb2 operation. The server side takes this operation and calls vfs_clone_file_range() and/or vfs_copy_file_range(), which means that the server can only execute a clone operation via this protocol request. Hence it executes a clone/copy operation on reciept rather than a dedupe operation and hence is potentially destroying the data in the destination file. Yeah, that's bad, but the server is only doing what the client asked it to do. Really, this looks like a CIFS client bug - it should not be advertising support for FIDEDUPERANGE to applications. i.e. the smb2 protocol implementation only appears to support FICLONERANGE semantics and so the client should be returning -EOPNOTSUPP when REMAP_FILE_DEDUPE is set in ->remap_file_range(). -Dave. -- Dave Chinner david@xxxxxxxxxxxxx