On Tue, 2022-02-15 at 20:23 +0800, xiubli@xxxxxxxxxx wrote: > From: Xiubo Li <xiubli@xxxxxxxxxx> > > There has one case that the snaprealm has been updated and then > it will iterate all the inode under it and try to queue a cap > snap for it. But in some case there has millions of subdirectries > or files under it and most of them no any Fw or dirty pages and > then will just be skipped. > > URL: https://tracker.ceph.com/issues/44100 > Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx> > --- > fs/ceph/snap.c | 37 +++++++++++++++++++++++++++---------- > 1 file changed, 27 insertions(+), 10 deletions(-) > > diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c > index c787775eaf2a..d075d3ce5f6d 100644 > --- a/fs/ceph/snap.c > +++ b/fs/ceph/snap.c > @@ -477,19 +477,21 @@ static bool has_new_snaps(struct ceph_snap_context *o, > static void ceph_queue_cap_snap(struct ceph_inode_info *ci) > { > struct inode *inode = &ci->vfs_inode; > - struct ceph_cap_snap *capsnap; > + struct ceph_cap_snap *capsnap = NULL; > struct ceph_snap_context *old_snapc, *new_snapc; > struct ceph_buffer *old_blob = NULL; > int used, dirty; > - > - capsnap = kmem_cache_alloc(ceph_cap_snap_cachep, GFP_NOFS); > - if (!capsnap) { > - pr_err("ENOMEM allocating ceph_cap_snap on %p\n", inode); > - return; > + bool need_flush = false; > + bool atomic_alloc_mem_failed = false; > + > +retry: > + if (unlikely(atomic_alloc_mem_failed)) { > + capsnap = kmem_cache_alloc(ceph_cap_snap_cachep, GFP_NOFS); > + if (!capsnap) { > + pr_err("ENOMEM allocating ceph_cap_snap on %p\n", inode); > + return; > + } > } > - capsnap->cap_flush.is_capsnap = true; > - INIT_LIST_HEAD(&capsnap->cap_flush.i_list); > - INIT_LIST_HEAD(&capsnap->cap_flush.g_list); > > spin_lock(&ci->i_ceph_lock); > used = __ceph_caps_used(ci); > @@ -532,7 +534,7 @@ static void ceph_queue_cap_snap(struct ceph_inode_info *ci) > */ > if (has_new_snaps(old_snapc, new_snapc)) { > if (dirty & (CEPH_CAP_ANY_EXCL|CEPH_CAP_FILE_WR)) > - capsnap->need_flush = true; > + need_flush = true; > } else { > if (!(used & CEPH_CAP_FILE_WR) && > ci->i_wrbuffer_ref_head == 0) { > @@ -542,6 +544,21 @@ static void ceph_queue_cap_snap(struct ceph_inode_info *ci) > } > } > > + if (!capsnap) { > + capsnap = kmem_cache_alloc(ceph_cap_snap_cachep, GFP_ATOMIC); > + if (unlikely(!capsnap)) { > + pr_err("ENOMEM atomic allocating ceph_cap_snap on %p\n", > + inode); > + spin_unlock(&ci->i_ceph_lock); > + atomic_alloc_mem_failed = true; > + goto retry; > + } > + } > + capsnap->need_flush = need_flush; > + capsnap->cap_flush.is_capsnap = true; > + INIT_LIST_HEAD(&capsnap->cap_flush.i_list); > + INIT_LIST_HEAD(&capsnap->cap_flush.g_list); > + > dout("queue_cap_snap %p cap_snap %p queuing under %p %s %s\n", > inode, capsnap, old_snapc, ceph_cap_string(dirty), > capsnap->need_flush ? "" : "no_flush"); I'm not so thrilled with this patch. First, are you sure you want GFP_ATOMIC here? Something like GFP_NOWAIT may be better since you have a fallback so the kernel can still make forward progress on reclaim if this returns NULL. That said, this is pretty kludgey. I'd much prefer to see something that didn't require this sort of hack. Maybe instead you could have queue_realm_cap_snaps do the allocation and pass a (struct ceph_cap_snap **) pointer in, and it can set the thing to NULL if it ends up using it? That way, we still don't do the allocation under spinlock and you only end up allocating the number you need (plus maybe one or two on the edges). -- Jeff Layton <jlayton@xxxxxxxxxx>