Xiubo Li <xiubli@xxxxxxxxxx> writes: > On 2/18/22 10:47 AM, xiubli@xxxxxxxxxx wrote: >> From: Xiubo Li <xiubli@xxxxxxxxxx> >> >> We will only track the uppest parent snapshot realm from which we >> need to rebuild the snapshot contexts _downward_ in hierarchy. For >> all the others having no new snapshot we will do nothing. >> >> This fix will avoid calling ceph_queue_cap_snap() on some inodes >> inappropriately. For example, with the code in mainline, suppose there >> are 2 directory hierarchies (with 6 directories total), like this: >> >> /dir_X1/dir_X2/dir_X3/ >> /dir_Y1/dir_Y2/dir_Y3/ >> >> Firstly, make a snapshot under /dir_X1/dir_X2/.snap/snap_X2, then make a >> root snapshot under /.snap/root_snap. Every time we make snapshots under >> /dir_Y1/..., the kclient will always try to rebuild the snap context for >> snap_X2 realm and finally will always try to queue cap snaps for dir_Y2 >> and dir_Y3, which makes no sense. >> >> That's because the snap_X2's seq is 2 and root_snap's seq is 3. So when >> creating a new snapshot under /dir_Y1/... the new seq will be 4, and >> the mds will send the kclient a snapshot backtrace in _downward_ >> order: seqs 4, 3. >> >> When ceph_update_snap_trace() is called, it will always rebuild the from >> the last realm, that's the root_snap. So later when rebuilding the snap >> context, the current logic will always cause it to rebuild the snap_X2 >> realm and then try to queue cap snaps for all the inodes related in that >> realm, even though it's not necessary. >> >> This is accompanied by a lot of these sorts of dout messages: >> >> "ceph: queue_cap_snap 00000000a42b796b nothing dirty|writing" >> >> Fix the logic to avoid this situation. >> >> The 'invalidate' word is not precise here, acutally it will rebuild >> the snapshot existing contexts or just build none-existing ones, >> rename it to 'rebuild_snapcs'. >> >> URL: https://tracker.ceph.com/issues/44100 >> Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> >> --- >> >> Changed in V2: >> - Thanks Zheng's feedback and switched to Zheng's patch. >> - Rename invalidate to rebuild_snapcs. >> >> >> >> fs/ceph/snap.c | 28 +++++++++++++++++++--------- >> 1 file changed, 19 insertions(+), 9 deletions(-) >> >> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c >> index dbf34f212596..6d55b8ba79d8 100644 >> --- a/fs/ceph/snap.c >> +++ b/fs/ceph/snap.c >> @@ -735,7 +735,8 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, >> __le64 *prior_parent_snaps; /* encoded */ >> struct ceph_snap_realm *realm = NULL; >> struct ceph_snap_realm *first_realm = NULL; >> - int invalidate = 0; >> + struct ceph_snap_realm *realm_to_rebuild = NULL; >> + int rebuild_snapcs; >> int err = -ENOMEM; >> LIST_HEAD(dirty_realms); >> @@ -743,6 +744,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, >> dout("update_snap_trace deletion=%d\n", deletion); >> more: >> + rebuild_snapcs = 0; >> ceph_decode_need(&p, e, sizeof(*ri), bad); >> ri = p; >> p += sizeof(*ri); >> @@ -766,7 +768,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, >> err = adjust_snap_realm_parent(mdsc, realm, le64_to_cpu(ri->parent)); >> if (err < 0) >> goto fail; >> - invalidate += err; >> + rebuild_snapcs += err; >> if (le64_to_cpu(ri->seq) > realm->seq) { >> dout("update_snap_trace updating %llx %p %lld -> %lld\n", >> @@ -791,22 +793,30 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc, >> if (realm->seq > mdsc->last_snap_seq) >> mdsc->last_snap_seq = realm->seq; >> - invalidate = 1; >> + rebuild_snapcs = 1; >> } else if (!realm->cached_context) { >> dout("update_snap_trace %llx %p seq %lld new\n", >> realm->ino, realm, realm->seq); >> - invalidate = 1; >> + rebuild_snapcs = 1; >> } else { >> dout("update_snap_trace %llx %p seq %lld unchanged\n", >> realm->ino, realm, realm->seq); >> } >> - dout("done with %llx %p, invalidated=%d, %p %p\n", realm->ino, >> - realm, invalidate, p, e); >> + dout("done with %llx %p, rebuild_snapcs=%d, %p %p\n", realm->ino, >> + realm, rebuild_snapcs, p, e); >> - /* invalidate when we reach the _end_ (root) of the trace */ >> - if (invalidate && p >= e) >> - rebuild_snap_realms(realm, &dirty_realms); >> + /* >> + * this will always track the uppest parent realm from which >> + * we need to rebuild the snapshot contexts _downward_ in >> + * hierarchy. >> + */ >> + if (rebuild_snapcs) >> + realm_to_rebuild = realm; >> + >> + /* rebuild_snapcs when we reach the _end_ (root) of the trace */ >> + if (rebuild_snapcs && p >= e) > > s/rebuild_snapcs/realm_to_rebuild/ > > This will fix the bug Luís Henriques reported. > > I have sent the V3 to fix it. Thanks. Awesome, thanks Xiubo. I've give it a try today. Cheer, -- Luís > > - Xiubo > > >> + rebuild_snap_realms(realm_to_rebuild, &dirty_realms); >> if (!first_realm) >> first_realm = realm; >