Patch "ceph: do not update snapshot context when there is no new snapshot" has been added to the 5.4-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Tue, 29 Nov 2022 21:27:01 -0500

This is a note to let you know that I've just added the patch titled

    ceph: do not update snapshot context when there is no new snapshot

to the 5.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     ceph-do-not-update-snapshot-context-when-there-is-no.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit d691daac90063d3dbc8944da11201a85a5e3b90c
Author: Xiubo Li <xiubli@xxxxxxxxxx>
Date:   Sat Feb 19 14:28:33 2022 +0800

    ceph: do not update snapshot context when there is no new snapshot
    
    [ Upstream commit 2e586641c950e7f3e7e008404bd783a466b9b590 ]
    
    We will only track the uppest parent snapshot realm from which we
    need to rebuild the snapshot contexts _downward_ in hierarchy. For
    all the others having no new snapshot we will do nothing.
    
    This fix will avoid calling ceph_queue_cap_snap() on some inodes
    inappropriately. For example, with the code in mainline, suppose there
    are 2 directory hierarchies (with 6 directories total), like this:
    
    /dir_X1/dir_X2/dir_X3/
    /dir_Y1/dir_Y2/dir_Y3/
    
    Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then make a
    root snapshot under /.snap/root_snap. Every time we make snapshots under
    /dir_Y1/..., the kclient will always try to rebuild the snap context for
    snap_X2 realm and finally will always try to queue cap snaps for dir_Y2
    and dir_Y3, which makes no sense.
    
    That's because the snap_X2's seq is 2 and root_snap's seq is 3. So when
    creating a new snapshot under /dir_Y1/... the new seq will be 4, and
    the mds will send the kclient a snapshot backtrace in _downward_
    order: seqs 4, 3.
    
    When ceph_update_snap_trace() is called, it will always rebuild the from
    the last realm, that's the root_snap. So later when rebuilding the snap
    context, the current logic will always cause it to rebuild the snap_X2
    realm and then try to queue cap snaps for all the inodes related in that
    realm, even though it's not necessary.
    
    This is accompanied by a lot of these sorts of dout messages:
    
        "ceph:  queue_cap_snap 00000000a42b796b nothing dirty|writing"
    
    Fix the logic to avoid this situation.
    
    Also, the 'invalidate' word is not precise here. In actuality, it will
    cause a rebuild of the existing snapshot contexts or just build
    non-existent ones. Rename it to 'rebuild_snapcs'.
    
    URL: https://tracker.ceph.com/issues/44100
    Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
    Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
    Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx>
    Stable-dep-of: 51884d153f7e ("ceph: avoid putting the realm twice when decoding snaps fails")
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index e1b9b224fcb2..f2731f9efc31 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -696,7 +696,8 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
 	__le64 *prior_parent_snaps;        /* encoded */
 	struct ceph_snap_realm *realm = NULL;
 	struct ceph_snap_realm *first_realm = NULL;
-	int invalidate = 0;
+	struct ceph_snap_realm *realm_to_rebuild = NULL;
+	int rebuild_snapcs;
 	int err = -ENOMEM;
 	LIST_HEAD(dirty_realms);
 
@@ -704,6 +705,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
 
 	dout("update_snap_trace deletion=%d\n", deletion);
 more:
+	rebuild_snapcs = 0;
 	ceph_decode_need(&p, e, sizeof(*ri), bad);
 	ri = p;
 	p += sizeof(*ri);
@@ -727,7 +729,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
 	err = adjust_snap_realm_parent(mdsc, realm, le64_to_cpu(ri->parent));
 	if (err < 0)
 		goto fail;
-	invalidate += err;
+	rebuild_snapcs += err;
 
 	if (le64_to_cpu(ri->seq) > realm->seq) {
 		dout("update_snap_trace updating %llx %p %lld -> %lld\n",
@@ -752,22 +754,30 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
 		if (realm->seq > mdsc->last_snap_seq)
 			mdsc->last_snap_seq = realm->seq;
 
-		invalidate = 1;
+		rebuild_snapcs = 1;
 	} else if (!realm->cached_context) {
 		dout("update_snap_trace %llx %p seq %lld new\n",
 		     realm->ino, realm, realm->seq);
-		invalidate = 1;
+		rebuild_snapcs = 1;
 	} else {
 		dout("update_snap_trace %llx %p seq %lld unchanged\n",
 		     realm->ino, realm, realm->seq);
 	}
 
-	dout("done with %llx %p, invalidated=%d, %p %p\n", realm->ino,
-	     realm, invalidate, p, e);
+	dout("done with %llx %p, rebuild_snapcs=%d, %p %p\n", realm->ino,
+	     realm, rebuild_snapcs, p, e);
 
-	/* invalidate when we reach the _end_ (root) of the trace */
-	if (invalidate && p >= e)
-		rebuild_snap_realms(realm, &dirty_realms);
+	/*
+	 * this will always track the uppest parent realm from which
+	 * we need to rebuild the snapshot contexts _downward_ in
+	 * hierarchy.
+	 */
+	if (rebuild_snapcs)
+		realm_to_rebuild = realm;
+
+	/* rebuild_snapcs when we reach the _end_ (root) of the trace */
+	if (realm_to_rebuild && p >= e)
+		rebuild_snap_realms(realm_to_rebuild, &dirty_realms);
 
 	if (!first_realm)
 		first_realm = realm;