Patch "btrfs: drop the backref cache during relocation if we commit" has been added to the 6.6-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    btrfs: drop the backref cache during relocation if we commit

to the 6.6-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     btrfs-drop-the-backref-cache-during-relocation-if-we.patch
and it can be found in the queue-6.6 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 6b98c95b9f5cb72dfb253611015d57f67ce8e133
Author: Josef Bacik <josef@xxxxxxxxxxxxxx>
Date:   Tue Sep 24 16:50:22 2024 -0400

    btrfs: drop the backref cache during relocation if we commit
    
    [ Upstream commit db7e68b522c01eb666cfe1f31637775f18997811 ]
    
    Since the inception of relocation we have maintained the backref cache
    across transaction commits, updating the backref cache with the new
    bytenr whenever we COWed blocks that were in the cache, and then
    updating their bytenr once we detected a transaction id change.
    
    This works as long as we're only ever modifying blocks, not changing the
    structure of the tree.
    
    However relocation does in fact change the structure of the tree.  For
    example, if we are relocating a data extent, we will look up all the
    leaves that point to this data extent.  We will then call
    do_relocation() on each of these leaves, which will COW down to the leaf
    and then update the file extent location.
    
    But, a key feature of do_relocation() is the pending list.  This is all
    the pending nodes that we modified when we updated the file extent item.
    We will then process all of these blocks via finish_pending_nodes, which
    calls do_relocation() on all of the nodes that led up to that leaf.
    
    The purpose of this is to make sure we don't break sharing unless we
    absolutely have to.  Consider the case that we have 3 snapshots that all
    point to this leaf through the same nodes, the initial COW would have
    created a whole new path.  If we did this for all 3 snapshots we would
    end up with 3x the number of nodes we had originally.  To avoid this we
    will cycle through each of the snapshots that point to each of these
    nodes and update their pointers to point at the new nodes.
    
    Once we update the pointer to the new node we will drop the node we
    removed the link for and all of its children via btrfs_drop_subtree().
    This is essentially just btrfs_drop_snapshot(), but for an arbitrary
    point in the snapshot.
    
    The problem with this is that we will never reflect this in the backref
    cache.  If we do this btrfs_drop_snapshot() for a node that is in the
    backref tree, we will leave the node in the backref tree.  This becomes
    a problem when we change the transid, as now the backref cache has
    entire subtrees that no longer exist, but exist as if they still are
    pointed to by the same roots.
    
    In the best case scenario you end up with "adding refs to an existing
    tree ref" errors from insert_inline_extent_backref(), where we attempt
    to link in nodes on roots that are no longer valid.
    
    Worst case you will double free some random block and re-use it when
    there's still references to the block.
    
    This is extremely subtle, and the consequences are quite bad.  There
    isn't a way to make sure our backref cache is consistent between
    transid's.
    
    In order to fix this we need to simply evict the entire backref cache
    anytime we cross transid's.  This reduces performance in that we have to
    rebuild this backref cache every time we change transid's, but fixes the
    bug.
    
    This has existed since relocation was added, and is a pretty critical
    bug.  There's a lot more cleanup that can be done now that this
    functionality is going away, but this patch is as small as possible in
    order to fix the problem and make it easy for us to backport it to all
    the kernels it needs to be backported to.
    
    Followup series will dismantle more of this code and simplify relocation
    drastically to remove this functionality.
    
    We have a reproducer that reproduced the corruption within a few minutes
    of running.  With this patch it survives several iterations/hours of
    running the reproducer.
    
    Fixes: 3fd0a5585eb9 ("Btrfs: Metadata ENOSPC handling for balance")
    CC: stable@xxxxxxxxxxxxxxx
    Reviewed-by: Boris Burkov <boris@xxxxxx>
    Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
    Signed-off-by: David Sterba <dsterba@xxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index df223ebf2551c..a2ba1c7fc16af 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -3098,10 +3098,14 @@ void btrfs_backref_release_cache(struct btrfs_backref_cache *cache)
 		btrfs_backref_cleanup_node(cache, node);
 	}
 
-	cache->last_trans = 0;
-
-	for (i = 0; i < BTRFS_MAX_LEVEL; i++)
-		ASSERT(list_empty(&cache->pending[i]));
+	for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
+		while (!list_empty(&cache->pending[i])) {
+			node = list_first_entry(&cache->pending[i],
+						struct btrfs_backref_node,
+						list);
+			btrfs_backref_cleanup_node(cache, node);
+		}
+	}
 	ASSERT(list_empty(&cache->pending_edge));
 	ASSERT(list_empty(&cache->useless_node));
 	ASSERT(list_empty(&cache->changed));
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 6e590da98742b..299eac696eb42 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -235,70 +235,6 @@ static struct btrfs_backref_node *walk_down_backref(
 	return NULL;
 }
 
-static void update_backref_node(struct btrfs_backref_cache *cache,
-				struct btrfs_backref_node *node, u64 bytenr)
-{
-	struct rb_node *rb_node;
-	rb_erase(&node->rb_node, &cache->rb_root);
-	node->bytenr = bytenr;
-	rb_node = rb_simple_insert(&cache->rb_root, node->bytenr, &node->rb_node);
-	if (rb_node)
-		btrfs_backref_panic(cache->fs_info, bytenr, -EEXIST);
-}
-
-/*
- * update backref cache after a transaction commit
- */
-static int update_backref_cache(struct btrfs_trans_handle *trans,
-				struct btrfs_backref_cache *cache)
-{
-	struct btrfs_backref_node *node;
-	int level = 0;
-
-	if (cache->last_trans == 0) {
-		cache->last_trans = trans->transid;
-		return 0;
-	}
-
-	if (cache->last_trans == trans->transid)
-		return 0;
-
-	/*
-	 * detached nodes are used to avoid unnecessary backref
-	 * lookup. transaction commit changes the extent tree.
-	 * so the detached nodes are no longer useful.
-	 */
-	while (!list_empty(&cache->detached)) {
-		node = list_entry(cache->detached.next,
-				  struct btrfs_backref_node, list);
-		btrfs_backref_cleanup_node(cache, node);
-	}
-
-	while (!list_empty(&cache->changed)) {
-		node = list_entry(cache->changed.next,
-				  struct btrfs_backref_node, list);
-		list_del_init(&node->list);
-		BUG_ON(node->pending);
-		update_backref_node(cache, node, node->new_bytenr);
-	}
-
-	/*
-	 * some nodes can be left in the pending list if there were
-	 * errors during processing the pending nodes.
-	 */
-	for (level = 0; level < BTRFS_MAX_LEVEL; level++) {
-		list_for_each_entry(node, &cache->pending[level], list) {
-			BUG_ON(!node->pending);
-			if (node->bytenr == node->new_bytenr)
-				continue;
-			update_backref_node(cache, node, node->new_bytenr);
-		}
-	}
-
-	cache->last_trans = 0;
-	return 1;
-}
-
 static bool reloc_root_is_dead(const struct btrfs_root *root)
 {
 	/*
@@ -557,9 +493,6 @@ static int clone_backref_node(struct btrfs_trans_handle *trans,
 	struct btrfs_backref_edge *new_edge;
 	struct rb_node *rb_node;
 
-	if (cache->last_trans > 0)
-		update_backref_cache(trans, cache);
-
 	rb_node = rb_simple_search(&cache->rb_root, src->commit_root->start);
 	if (rb_node) {
 		node = rb_entry(rb_node, struct btrfs_backref_node, rb_node);
@@ -3682,11 +3615,9 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
 			break;
 		}
 restart:
-		if (update_backref_cache(trans, &rc->backref_cache)) {
-			btrfs_end_transaction(trans);
-			trans = NULL;
-			continue;
-		}
+		if (rc->backref_cache.last_trans != trans->transid)
+			btrfs_backref_release_cache(&rc->backref_cache);
+		rc->backref_cache.last_trans = trans->transid;
 
 		ret = find_next_extent(rc, path, &key);
 		if (ret < 0)




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux