On 2/17/22 11:28 PM, Yan, Zheng wrote:
On Thu, Feb 17, 2022 at 6:55 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
On Thu, 2022-02-17 at 11:03 +0800, Yan, Zheng wrote:
On Tue, Feb 15, 2022 at 11:04 PM <xiubli@xxxxxxxxxx> wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>
No need to update snapshot context when any of the following two
cases happens:
1: if my context seq matches realm's seq and realm has no parent.
2: if my context seq equals or is larger than my parent's, this
works because we rebuild_snap_realms() works _downward_ in
hierarchy after each update.
This fix will avoid those inodes which accidently calling
ceph_queue_cap_snap() and make no sense, for exmaple:
There have 6 directories like:
/dir_X1/dir_X2/dir_X3/
/dir_Y1/dir_Y2/dir_Y3/
Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then
make a root snapshot under /.snap/root_snap. And every time when
we make snapshots under /dir_Y1/..., the kclient will always try
to rebuild the snap context for snap_X2 realm and finally will
always try to queue cap snaps for dir_Y2 and dir_Y3, which makes
no sense.
That's because the snap_X2's seq is 2 and root_snap's seq is 3.
So when creating a new snapshot under /dir_Y1/... the new seq
will be 4, and then the mds will send kclient a snapshot backtrace
in _downward_ in hierarchy: seqs 4, 3. Then in ceph_update_snap_trace()
it will always rebuild the from the last realm, that's the root_snap.
So later when rebuilding the snap context it will always rebuild
the snap_X2 realm and then try to queue cap snaps for all the inodes
related in snap_X2 realm, and we are seeing the logs like:
"ceph: queue_cap_snap 00000000a42b796b nothing dirty|writing"
URL: https://tracker.ceph.com/issues/44100
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
fs/ceph/snap.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index d075d3ce5f6d..1f24a5de81e7 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -341,14 +341,16 @@ static int build_snap_context(struct ceph_snap_realm *realm,
num += parent->cached_context->num_snaps;
}
- /* do i actually need to update? not if my context seq
- matches realm seq, and my parents' does to. (this works
- because we rebuild_snap_realms() works _downward_ in
- hierarchy after each update.) */
+ /* do i actually need to update? No need when any of the following
+ * two cases:
+ * #1: if my context seq matches realm's seq and realm has no parent.
+ * #2: if my context seq equals or is larger than my parent's, this
+ * works because we rebuild_snap_realms() works _downward_ in
+ * hierarchy after each update.
+ */
if (realm->cached_context &&
- realm->cached_context->seq == realm->seq &&
- (!parent ||
- realm->cached_context->seq >= parent->cached_context->seq)) {
+ ((realm->cached_context->seq == realm->seq && !parent) ||
+ (parent && realm->cached_context->seq >= parent->cached_context->seq))) {
With this change. When you mksnap on /dir_Y1/, its snap context keeps
unchanged. In ceph_update_snap_trace, reset the 'invalidate' variable
for each realm should fix this issue.
Thanks Zheng for your feedback.
Yeah, there has one case this will happen. Your approach is simpler I
will post a V2 for this.
-- Xiubo
This comment is terribly vague. "invalidate" is a local variable in that
function and isn't set on a per-realm basis.
Could you suggest a patch on top of Xiubo's patch instead?
something like this (not tested)
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index af502a8245f0..6ef41764008b 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -704,7 +704,8 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
__le64 *prior_parent_snaps; /* encoded */
struct ceph_snap_realm *realm = NULL;
struct ceph_snap_realm *first_realm = NULL;
- int invalidate = 0;
+ struct ceph_snap_realm *realm_to_inval = NULL;
+ int invalidate;
int err = -ENOMEM;
LIST_HEAD(dirty_realms);
@@ -712,6 +713,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
dout("update_snap_trace deletion=%d\n", deletion);
more:
+ invalidate = 0;
ceph_decode_need(&p, e, sizeof(*ri), bad);
ri = p;
p += sizeof(*ri);
@@ -774,8 +776,10 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
realm, invalidate, p, e);
/* invalidate when we reach the _end_ (root) of the trace */
- if (invalidate && p >= e)
- rebuild_snap_realms(realm, &dirty_realms);
+ if (invalidate)
+ realm_to_inval = realm;
+ if (realm_to_inval && p >= e)
+ rebuild_snap_realms(realm_to_inval, &dirty_realms);
if (!first_realm)
first_realm = realm;
dout("build_snap_context %llx %p: %p seq %lld (%u snaps),
" (unchanged)\n",
realm->ino, realm, realm->cached_context,
--
2.27.0
--
Jeff Layton <jlayton@xxxxxxxxxx>