Re: [PATCH] btrfs: fix wrong block_start calculation for btrfs_drop_extent_map_range()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Apr 7, 2024 at 2:18 AM Qu Wenruo <wqu@xxxxxxxx> wrote:
>
> [BUG]
> During my extent_map cleanup/refactor, with more than too strict sanity
> checks, extent-map-tests::test_case_7() would crash my extent_map sanity
> checks.
>
> The problem is, after btrfs_drop_extent_map_range(), the resulted
> extent_map has a @block_start way too large.
> Meanwhile my btrfs_file_extent_item based members are returning a
> correct @disk_bytenr along with correct @offset.
>
> The extent map layout looks like this:
>
>      0        16K    32K       48K
>      | PINNED |      | Regular |
>
> The regular em at [32K, 48K) also has 32K @block_start.
>
> Then drop range [0, 36K), which should shrink the regular one to be
> [36K, 48K).
> However the @block_start is incorrect, we expect 32K + 4K, but got 52K.
>
> [CAUSE]
> Inside btrfs_drop_extent_map_range() function, if we hit an extent_map
> that covers the target range but is still beyond it, we need to split
> that extent map into half:
>
>         |<-- drop range -->|
>                  |<----- existing extent_map --->|
>
> And if the extent map is not compressed, we need to forward
> extent_map::block_start by the difference between the end of drop range
> and the extent map start.
>
> However in that particular case, the difference is calculated using
> (start + len - em->start).
>
> The problem is @start can be modified if the drop range covers any
> pinned extent.
>
> This leads to wrong calculation, and would be caught by my later
> extent_map sanity checks, which checks the em::block_start against
> btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset.
>
> And unfortunately this is going to cause data corruption, as the
> splitted em is pointing an incorrect location, can cause either
> unexpected read error or wild writes.

It can't happen for either reads or writes actually.

As for writes, it can't happen because:

1) The issue only happens when skip_pinned is true, which is the only
case that adjusts the 'start' variable (parameter);

2) All IO paths pass false for the skip_pinned parameter, only
relocation passes true when replacing the bytenr in file extent items,
and the range it uses for btrfs_drop_extent_map_range() matches the
extent item's range, so it won't cover extent maps outside the range;

3) Extent maps for writes in progress are always pinned;

4) Before doing IO on a range we lock the range and wait for any
existing ordered extents in the range to complete, which results in
unpinning extent maps;

5) Extent maps for writes are created when running delalloc (or during
the write for direct IO), along with the ordered extent, and are
created as pinned.

With all these, I don't see how we can get a "wild write" or any
problem in a write path.

As for reads, it doesn't happen because of what's said in 2 regarding
the range passed to btrfs_drop_extent_map_range().

So as far as I can see, it's currently a harmless bug, and maybe it
always has been because the bad calculation has been there since 2008,
see below.
If it affected reads or writes, it would be easy to trigger with
fstests and fsx for example (fstests).

It's certainly a bug, it just doesn't have any consequences as far as
I can see, so the changelog should be updated.

>
> [FIX]
> Fix it by avoiding using @start completely, and use @end - em->start
> instead, which @end is exclusive bytenr number.
>
> And update the test case to verify the @block_start to prevent such
> problem from happening.
>
> CC: stable@xxxxxxxxxxxxxxx # 6.7+
> Fixes: c962098ca4af ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range")

That commit doesn't influence how split->block_start is updated, only
split->start and split->len.
So I can't understand why you chose to blame that commit.

The bug was actually introduced in 2008 by the following commit:

3b951516ed70 ("Btrfs: Use the extent map cache to find the logical
disk block during data retries")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3b951516ed703af0f6d82053937655ad69b60864

> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> ---
>  fs/btrfs/extent_map.c             | 2 +-
>  fs/btrfs/tests/extent-map-tests.c | 6 +++++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> index 471654cb65b0..955ce300e5a1 100644
> --- a/fs/btrfs/extent_map.c
> +++ b/fs/btrfs/extent_map.c
> @@ -799,7 +799,7 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
>                                         split->block_len = em->block_len;
>                                         split->orig_start = em->orig_start;
>                                 } else {
> -                                       const u64 diff = start + len - em->start;
> +                                       const u64 diff = end - em->start;
>
>                                         split->block_len = split->len;
>                                         split->block_start += diff;
> diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
> index 253cce7ffecf..80e71c5cb7ab 100644
> --- a/fs/btrfs/tests/extent-map-tests.c
> +++ b/fs/btrfs/tests/extent-map-tests.c
> @@ -818,7 +818,6 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
>                 test_err("em->len is %llu, expected 16K", em->len);
>                 goto out;
>         }
> -

Please avoid such accidental changes.

Thanks.

>         free_extent_map(em);
>
>         read_lock(&em_tree->lock);
> @@ -847,6 +846,11 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
>                 goto out;
>         }
>
> +       if (em->block_start != SZ_32K + SZ_4K) {
> +               test_err("em->block_start is %llu, expected 36K", em->block_start);
> +               goto out;
> +       }
> +
>         free_extent_map(em);
>
>         read_lock(&em_tree->lock);
> --
> 2.44.0
>
>





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux