Re: [PATCH 2/3] ceph: move kzalloc under i_ceph_lock with GFP_ATOMIC flag

Xiubo Li <xiubli@xxxxxxxxxx> · Wed, 16 Feb 2022 08:29:28 +0800

On 2/16/22 12:57 AM, Jeff Layton wrote:
On Tue, 2022-02-15 at 20:23 +0800, xiubli@xxxxxxxxxx wrote:
From: Xiubo Li <xiubli@xxxxxxxxxx>

There has one case that the snaprealm has been updated and then
it will iterate all the inode under it and try to queue a cap
snap for it. But in some case there has millions of subdirectries
or files under it and most of them no any Fw or dirty pages and
then will just be skipped.

URL: https://tracker.ceph.com/issues/44100
Signed-off-by: Xiubo Li <xiubli@xxxxxxxxxx>
---
  fs/ceph/snap.c | 37 +++++++++++++++++++++++++++----------
  1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index c787775eaf2a..d075d3ce5f6d 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -477,19 +477,21 @@ static bool has_new_snaps(struct ceph_snap_context *o,
  static void ceph_queue_cap_snap(struct ceph_inode_info *ci)
  {
  	struct inode *inode = &ci->vfs_inode;
-	struct ceph_cap_snap *capsnap;
+	struct ceph_cap_snap *capsnap = NULL;
  	struct ceph_snap_context *old_snapc, *new_snapc;
  	struct ceph_buffer *old_blob = NULL;
  	int used, dirty;
-
-	capsnap = kmem_cache_alloc(ceph_cap_snap_cachep, GFP_NOFS);
-	if (!capsnap) {
-		pr_err("ENOMEM allocating ceph_cap_snap on %p\n", inode);
-		return;
+	bool need_flush = false;
+	bool atomic_alloc_mem_failed = false;
+
+retry:
+	if (unlikely(atomic_alloc_mem_failed)) {
+	        capsnap = kmem_cache_alloc(ceph_cap_snap_cachep, GFP_NOFS);
+		if (!capsnap) {
+			pr_err("ENOMEM allocating ceph_cap_snap on %p\n", inode);
+			return;
+		}
  	}
-	capsnap->cap_flush.is_capsnap = true;
-	INIT_LIST_HEAD(&capsnap->cap_flush.i_list);
-	INIT_LIST_HEAD(&capsnap->cap_flush.g_list);
  
  	spin_lock(&ci->i_ceph_lock);
  	used = __ceph_caps_used(ci);
@@ -532,7 +534,7 @@ static void ceph_queue_cap_snap(struct ceph_inode_info *ci)
  	 */
  	if (has_new_snaps(old_snapc, new_snapc)) {
  		if (dirty & (CEPH_CAP_ANY_EXCL|CEPH_CAP_FILE_WR))
-			capsnap->need_flush = true;
+			need_flush = true;
  	} else {
  		if (!(used & CEPH_CAP_FILE_WR) &&
  		    ci->i_wrbuffer_ref_head == 0) {
@@ -542,6 +544,21 @@ static void ceph_queue_cap_snap(struct ceph_inode_info *ci)
  		}
  	}
  
+	if (!capsnap) {
+	        capsnap = kmem_cache_alloc(ceph_cap_snap_cachep, GFP_ATOMIC);
+		if (unlikely(!capsnap)) {
+			pr_err("ENOMEM atomic allocating ceph_cap_snap on %p\n",
+			       inode);
+			spin_unlock(&ci->i_ceph_lock);
+			atomic_alloc_mem_failed = true;
+			goto retry;
+		}
+	}
+	capsnap->need_flush = need_flush;
+	capsnap->cap_flush.is_capsnap = true;
+	INIT_LIST_HEAD(&capsnap->cap_flush.i_list);
+	INIT_LIST_HEAD(&capsnap->cap_flush.g_list);
+
  	dout("queue_cap_snap %p cap_snap %p queuing under %p %s %s\n",
  	     inode, capsnap, old_snapc, ceph_cap_string(dirty),
  	     capsnap->need_flush ? "" : "no_flush");
I'm not so thrilled with this patch.

First, are you sure you want GFP_ATOMIC here? Something like GFP_NOWAIT
may be better since you have a fallback so the kernel can still make
forward progress on reclaim if this returns NULL.

That said, this is pretty kludgey. I'd much prefer to see something that
didn't require this sort of hack. Maybe instead you could have
queue_realm_cap_snaps do the allocation and pass a (struct ceph_cap_snap
**) pointer in, and it can set the thing to NULL if it ends up using it?

Sounds good, I will switch to this approach in V2.

Thanks.


That way, we still don't do the allocation under spinlock and you only
end up allocating the number you need (plus maybe one or two on the
edges).