On Fri, Dec 11, 2020 at 8:56 AM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > > While running btrfs/011 in a loop I would often ASSERT() while trying to > add a new free space entry that already existed, or get an -EEXIST while > adding a new block to the extent tree, which is another indication of > double allocation. > > This occurs because when we do the free space tree population, we create > the new root and then populate the tree and commit the transaction. > The problem is when you create a new root, the root node and commit root > node are the same. This means that caching a block group before the > transaction is committed can race with other operations modifying the > free space tree, and thus you can get double adds and other sort of > shenanigans. This is only a problem for the first transaction, once > we've committed the transaction we created the free space tree in we're > OK to use the free space tree to cache block groups. > > Fix this by marking the fs_info as unsafe to load the free space tree, > and fall back on the old slow method. We could be smarter than this, > for example caching the block group while we're populating the free > space tree, but since this is a serious problem I've opted for the > simplest solution. > > cc: stable@xxxxxxxxxxxxxxx > Fixes: a5ed91828518 ("Btrfs: implement the free space B-tree") > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> Looks good, but the explanation you gave in the reply to Nikolay's comment makes it easier to realize how the problem happens. I think it should be mentioned, in the changelog, that the operations that can concurrently modify the free space tree are the insertions from running delayed references. Anyway, Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx> Thanks. > --- > fs/btrfs/block-group.c | 11 ++++++++++- > fs/btrfs/ctree.h | 3 +++ > fs/btrfs/free-space-tree.c | 10 +++++++++- > 3 files changed, 22 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c > index 52f2198d44c9..b8bbdd95743e 100644 > --- a/fs/btrfs/block-group.c > +++ b/fs/btrfs/block-group.c > @@ -673,7 +673,16 @@ static noinline void caching_thread(struct btrfs_work *work) > wake_up(&caching_ctl->wait); > } > > - if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE)) > + /* > + * If we are in the transaction that populated the free space tree we > + * can't actually cache from the free space tree as our commit root and > + * real root are the same, so we could change the contents of the blocks > + * while caching. Instead do the slow caching in this case, and after > + * the transaction has committed we will be safe. > + */ > + if (btrfs_fs_compat_ro(fs_info, FREE_SPACE_TREE) && > + !(test_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, > + &fs_info->flags))) > ret = load_free_space_tree(caching_ctl); > else > ret = load_extent_tree_free(caching_ctl); > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 3935d297d198..4a60d81da5cb 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -562,6 +562,9 @@ enum { > > /* Indicate that we need to cleanup space cache v1 */ > BTRFS_FS_CLEANUP_SPACE_CACHE_V1, > + > + /* Indicate that we can't trust the free space tree for caching yet. */ > + BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, > }; > > /* > diff --git a/fs/btrfs/free-space-tree.c b/fs/btrfs/free-space-tree.c > index e33a65bd9a0c..a33bca94d133 100644 > --- a/fs/btrfs/free-space-tree.c > +++ b/fs/btrfs/free-space-tree.c > @@ -1150,6 +1150,7 @@ int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info) > return PTR_ERR(trans); > > set_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags); > + set_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags); > free_space_root = btrfs_create_tree(trans, > BTRFS_FREE_SPACE_TREE_OBJECTID); > if (IS_ERR(free_space_root)) { > @@ -1171,11 +1172,18 @@ int btrfs_create_free_space_tree(struct btrfs_fs_info *fs_info) > btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE); > btrfs_set_fs_compat_ro(fs_info, FREE_SPACE_TREE_VALID); > clear_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags); > + ret = btrfs_commit_transaction(trans); > > - return btrfs_commit_transaction(trans); > + /* > + * Now that we've committed the transaction any reading of our commit > + * root will be safe, so we can cache from the free space tree now. > + */ > + clear_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags); > + return ret; > > abort: > clear_bit(BTRFS_FS_CREATING_FREE_SPACE_TREE, &fs_info->flags); > + clear_bit(BTRFS_FS_FREE_SPACE_TREE_UNTRUSTED, &fs_info->flags); > btrfs_abort_transaction(trans, ret); > btrfs_end_transaction(trans); > return ret; > -- > 2.26.2 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”