Patch "btrfs: send: avoid unaligned encoded writes when attempting to clone range" has been added to the 5.15-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    btrfs: send: avoid unaligned encoded writes when attempting to clone range

to the 5.15-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     btrfs-send-avoid-unaligned-encoded-writes-when-attem.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 5d242611e674c83a16db5fe6b7087ab7d3698f49
Author: Filipe Manana <fdmanana@xxxxxxxx>
Date:   Tue Nov 15 16:29:44 2022 +0000

    btrfs: send: avoid unaligned encoded writes when attempting to clone range
    
    [ Upstream commit a11452a3709e217492798cf3686ac2cc8eb3fb51 ]
    
    When trying to see if we can clone a file range, there are cases where we
    end up sending two write operations in case the inode from the source root
    has an i_size that is not sector size aligned and the length from the
    current offset to its i_size is less than the remaining length we are
    trying to clone.
    
    Issuing two write operations when we could instead issue a single write
    operation is not incorrect. However it is not optimal, specially if the
    extents are compressed and the flag BTRFS_SEND_FLAG_COMPRESSED was passed
    to the send ioctl. In that case we can end up sending an encoded write
    with an offset that is not sector size aligned, which makes the receiver
    fallback to decompressing the data and writing it using regular buffered
    IO (so re-compressing the data in case the fs is mounted with compression
    enabled), because encoded writes fail with -EINVAL when an offset is not
    sector size aligned.
    
    The following example, which triggered a bug in the receiver code for the
    fallback logic of decompressing + regular buffer IO and is fixed by the
    patchset referred in a Link at the bottom of this changelog, is an example
    where we have the non-optimal behaviour due to an unaligned encoded write:
    
       $ cat test.sh
       #!/bin/bash
    
       DEV=/dev/sdj
       MNT=/mnt/sdj
    
       mkfs.btrfs -f $DEV > /dev/null
       mount -o compress $DEV $MNT
    
       # File foo has a size of 33K, not aligned to the sector size.
       xfs_io -f -c "pwrite -S 0xab 0 33K" $MNT/foo
    
       xfs_io -f -c "pwrite -S 0xcd 0 64K" $MNT/bar
    
       # Now clone the first 32K of file bar into foo at offset 0.
       xfs_io -c "reflink $MNT/bar 0 0 32K" $MNT/foo
    
       # Snapshot the default subvolume and create a full send stream (v2).
       btrfs subvolume snapshot -r $MNT $MNT/snap
    
       btrfs send --compressed-data -f /tmp/test.send $MNT/snap
    
       echo -e "\nFile bar in the original filesystem:"
       od -A d -t x1 $MNT/snap/bar
    
       umount $MNT
       mkfs.btrfs -f $DEV > /dev/null
       mount $DEV $MNT
    
       echo -e "\nReceiving stream in a new filesystem..."
       btrfs receive -f /tmp/test.send $MNT
    
       echo -e "\nFile bar in the new filesystem:"
       od -A d -t x1 $MNT/snap/bar
    
       umount $MNT
    
    Before this patch, the send stream included one regular write and one
    encoded write for file 'bar', with the later being not sector size aligned
    and causing the receiver to fallback to decompression + buffered writes.
    The output of the btrfs receive command in verbose mode (-vvv):
    
       (...)
       mkfile o258-7-0
       rename o258-7-0 -> bar
       utimes
       clone bar - source=foo source offset=0 offset=0 length=32768
       write bar - offset=32768 length=1024
       encoded_write bar - offset=33792, len=4096, unencoded_offset=33792, unencoded_file_len=31744, unencoded_len=65536, compression=1, encryption=0
       encoded_write bar - falling back to decompress and write due to errno 22 ("Invalid argument")
       (...)
    
    This patch avoids the regular write followed by an unaligned encoded write
    so that we end up sending a single encoded write that is aligned. So after
    this patch the stream content is (output of btrfs receive -vvv):
    
       (...)
       mkfile o258-7-0
       rename o258-7-0 -> bar
       utimes
       clone bar - source=foo source offset=0 offset=0 length=32768
       encoded_write bar - offset=32768, len=4096, unencoded_offset=32768, unencoded_file_len=32768, unencoded_len=65536, compression=1, encryption=0
       (...)
    
    So we get more optimal behaviour and avoid the silent data loss bug in
    versions of btrfs-progs affected by the bug referred by the Link tag
    below (btrfs-progs v5.19, v5.19.1, v6.0 and v6.0.1).
    
    Link: https://lore.kernel.org/linux-btrfs/cover.1668529099.git.fdmanana@xxxxxxxx/
    Reviewed-by: Boris Burkov <boris@xxxxxx>
    Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
    Signed-off-by: David Sterba <dsterba@xxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 4d2c6ce29fe5..9250a17731bd 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -5398,6 +5398,7 @@ static int clone_range(struct send_ctx *sctx,
 		u64 ext_len;
 		u64 clone_len;
 		u64 clone_data_offset;
+		bool crossed_src_i_size = false;
 
 		if (slot >= btrfs_header_nritems(leaf)) {
 			ret = btrfs_next_leaf(clone_root->root, path);
@@ -5454,8 +5455,10 @@ static int clone_range(struct send_ctx *sctx,
 		if (key.offset >= clone_src_i_size)
 			break;
 
-		if (key.offset + ext_len > clone_src_i_size)
+		if (key.offset + ext_len > clone_src_i_size) {
 			ext_len = clone_src_i_size - key.offset;
+			crossed_src_i_size = true;
+		}
 
 		clone_data_offset = btrfs_file_extent_offset(leaf, ei);
 		if (btrfs_file_extent_disk_bytenr(leaf, ei) == disk_byte) {
@@ -5515,6 +5518,25 @@ static int clone_range(struct send_ctx *sctx,
 				ret = send_clone(sctx, offset, clone_len,
 						 clone_root);
 			}
+		} else if (crossed_src_i_size && clone_len < len) {
+			/*
+			 * If we are at i_size of the clone source inode and we
+			 * can not clone from it, terminate the loop. This is
+			 * to avoid sending two write operations, one with a
+			 * length matching clone_len and the final one after
+			 * this loop with a length of len - clone_len.
+			 *
+			 * When using encoded writes (BTRFS_SEND_FLAG_COMPRESSED
+			 * was passed to the send ioctl), this helps avoid
+			 * sending an encoded write for an offset that is not
+			 * sector size aligned, in case the i_size of the source
+			 * inode is not sector size aligned. That will make the
+			 * receiver fallback to decompression of the data and
+			 * writing it using regular buffered IO, therefore while
+			 * not incorrect, it's not optimal due decompression and
+			 * possible re-compression at the receiver.
+			 */
+			break;
 		} else {
 			ret = send_extent_data(sctx, offset, clone_len);
 		}



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux