Re: [PATCH v2] btrfs: fix possible free space tree corruption with online conversion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 11, 2020 at 8:56 AM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
>
> While running btrfs/011 in a loop I would often ASSERT() while trying to
> add a new free space entry that already existed, or get an -EEXIST while
> adding a new block to the extent tree, which is another indication of
> double allocation.
>
> This occurs because when we do the free space tree population, we create
> the new root and then populate the tree and commit the transaction.
> The problem is when you create a new root, the root node and commit root
> node are the same.  This means that caching a block group before the
> transaction is committed can race with other operations modifying the
> free space tree, and thus you can get double adds and other sort of
> shenanigans.  This is only a problem for the first transaction, once
> we've committed the transaction we created the free space tree in we're
> OK to use the free space tree to cache block groups.
>
> Fix this by marking the fs_info as unsafe to load the free space tree,
> and fall back on the old slow method.  We could be smarter than this,
> for example caching the block group while we're populating the free
> space tree, but since this is a serious problem I've opted for the
> simplest solution.
>
> cc: stable@xxxxxxxxxxxxxxx
> Fixes: a5ed91828518 ("Btrfs: implement the free space B-tree")
> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>

Looks good, but the explanation you gave in the reply to Nikolay's
comment makes it easier to realize how the problem happens.
I think it should be mentioned, in the changelog, that the operations
that can concurrently modify the free space tree are the insertions
from running delayed references.

Anyway,

Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx>

Thanks.

> ---
>  fs/btrfs/block-group.c     | 11 ++++++++++-
>  fs/btrfs/ctree.h           |  3 +++
>  fs/btrfs/free-space-tree.c | 10 +++++++++-
>  3 files changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 52f2198d44c9..b8bbdd95743e 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -673,7 +673,16 @@ static noinline void caching_thread(struct btrfs_work *work)
>                 wake_up(&caching_ctl->wait);
>         }
>
> -       if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE))
> +       /*
> +        * If we are in the transaction that populated the free space tree we
> +        * can't actually cache from the free space tree as our commit root and
> +        * real root are the same, so we could change the contents of the blocks
> +        * while caching.  Instead do the slow caching in this case, and after
> +        * the transaction has committed we will be safe.
> +        */
> +       if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) &&
> +           !(test_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED,
> +                      &fs_info->flags)))
>                 ret = load_free_space_tree(caching_ctl);
>         else
>                 ret = load_extent_tree_free(caching_ctl);
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 3935d297d198..4a60d81da5cb 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -562,6 +562,9 @@ enum {
>
>         /* Indicate that we need to cleanup space cache v1 */
>         BTRFS_FS_CLEANUP_SPACE_CACHE_V1,
> +
> +       /* Indicate that we can't trust the free space tree for caching yet. */
> +       BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED,
>  };
>
>  /*
> diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c
> index e33a65bd9a0c..a33bca94d133 100644
> --- a/fs/btrfs/free-space-tree.c
> +++ b/fs/btrfs/free-space-tree.c
> @@ -1150,6 +1150,7 @@ int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info)
>                 return PTR_ERR(trans);
>
>         set_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
> +       set_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
>         free_space_root = btrfs_create_tree(trans,
>                                             BTRFS_FREE_SPACE_TREE_OBJECTID);
>         if (IS_ERR(free_space_root)) {
> @@ -1171,11 +1172,18 @@ int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info)
>         btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE);
>         btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID);
>         clear_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
> +       ret = btrfs_commit_transaction(trans);
>
> -       return btrfs_commit_transaction(trans);
> +       /*
> +        * Now that we've committed the transaction any reading of our commit
> +        * root will be safe, so we can cache from the free space tree now.
> +        */
> +       clear_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
> +       return ret;
>
>  abort:
>         clear_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags);
> +       clear_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags);
>         btrfs_abort_transaction(trans, ret);
>         btrfs_end_transaction(trans);
>         return ret;
> --
> 2.26.2
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux