On 2021/10/2 01:57, Josef Bacik wrote:
I hit a stuck relocation on btrfs/061 during my overnight testing. This turned out to be because we had left over extent entries in our extent root for a data reloc inode that no longer existed. This happened because in btrfs_drop_extents() we only update refs if we have SHAREABLE set or we are the tree_root. This regression was introduced by aeb935a45581 ("btrfs: don't set SHAREABLE flag for data reloc tree") where we stopped setting SHAREABLE for the data reloc tree. The problem here is we actually do want to update extent references for data extents in the data reloc tree, in fact we only don't want to update extent references if the file extents are in the log tree. Update this check to only skip updating references in the case of the log tree. This is relatively rare, because you have to be running scrub at the same time, which is what btrfs/061 does. The data reloc inode has its extents pre-allcated, and then we copy the extent into the pre-allocated chunks. We theoretically should never be calling btrfs_drop_extents() on a data reloc inode. The exception of course is with scrub, if our pre-allocated extent falls inside of the block group we are scrubbing, then the block group will be marked read only and we will be forced to cow that extent. This means we will call btrfs_drop_extents() on that range when we cow that file extent.
Oh my god, I forgot the corner case here!
This isn't really problematic if we do this, the data reloc inode requires that our extent lengths match exactly with the extent we are copying, thankfully we validate the extent is correct with get_new_location(), so if we happen to cow only part of the extent we won't link it in when we do the relocation, so we are safe from any other shenanigans that arise because of this interaction with scrub.
But this makes me wonder, can we just leave scrub and balance exclusive? There are already quite some limitations, like balance and send. Adding balance and scrub to be exclusive to each other shouldn't cause too much hassle, and can remove these checks.
cc: stable@xxxxxxxxxxxxxxx Fixes: aeb935a45581 ("btrfs: don't set SHAREABLE flag for data reloc tree") Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
Reviewed-by: Qu Wenruo <wqu@xxxxxxxx> Thanks for the cause analyse and fix! Qu
--- fs/btrfs/file.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 04e29b40a38e..b7d3559efcf7 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -734,8 +734,7 @@ int btrfs_drop_extents(struct btrfs_trans_handle *trans, if (args->start >= inode->disk_i_size && !args->replace_extent) modify_tree = 0; - update_refs = (test_bit(BTRFS_ROOT_SHAREABLE, &root->state) || - root == fs_info->tree_root); + update_refs = root->root_key.objectid != BTRFS_TREE_LOG_OBJECTID; while (1) { recow = 0; ret = btrfs_lookup_file_extent(trans, root, path, ino,