Re: [RFC PATCH] ceph: guard against __ceph_remove_cap races

Xiubo Li <xiubli@xxxxxxxxxx> · Mon, 16 Dec 2019 09:57:00 +0800

On 2019/12/13 1:31, Jeff Layton wrote:
I believe it's possible that we could end up with racing calls to
__ceph_remove_cap for the same cap. If that happens, the cap->ci
pointer will be zereoed out and we can hit a NULL pointer dereference.

Once we acquire the s_cap_lock, check for the ci pointer being NULL,
and just return without doing anything if it is.

URL: https://tracker.ceph.com/issues/43272
Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
  fs/ceph/caps.c | 21 ++++++++++++++++-----
  1 file changed, 16 insertions(+), 5 deletions(-)

This is the only scenario that made sense to me in light of Ilya's
analysis on the tracker above. I could be off here though -- the locking
around this code is horrifically complex, and I could be missing what
should guard against this scenario.

Checked the downstream 3.10.0-862.14.4 code, it seems that when doing 
trim_caps_cb() and at the same time if the inode is being destroyed we 
could hit this.

All the __ceph_remove_cap() calls will be protected by the 
"ci->i_ceph_lock" lock, only except when destroying the inode.

And the upstream seems have no this problem now.

BRs


Thoughts?

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 9d09bb53c1ab..7e39ee8eff60 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1046,11 +1046,22 @@ static void drop_inode_snap_realm(struct ceph_inode_info *ci)
  void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
  {
  	struct ceph_mds_session *session = cap->session;
-	struct ceph_inode_info *ci = cap->ci;
-	struct ceph_mds_client *mdsc =
-		ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
+	struct ceph_inode_info *ci;
+	struct ceph_mds_client *mdsc;
  	int removed = 0;
  
+	spin_lock(&session->s_cap_lock);
+	ci = cap->ci;
+	if (!ci) {
+		/*
+		 * Did we race with a competing __ceph_remove_cap call? If
+		 * ci is zeroed out, then just unlock and don't do anything.
+		 * Assume that it's on its way out anyway.
+		 */
+		spin_unlock(&session->s_cap_lock);
+		return;
+	}
+
  	dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode);
  
  	/* remove from inode's cap rbtree, and clear auth cap */
@@ -1058,13 +1069,12 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
  	if (ci->i_auth_cap == cap)
  		ci->i_auth_cap = NULL;
  
-	/* remove from session list */
-	spin_lock(&session->s_cap_lock);
  	if (session->s_cap_iterator == cap) {
  		/* not yet, we are iterating over this very cap */
  		dout("__ceph_remove_cap  delaying %p removal from session %p\n",
  		     cap, cap->session);
  	} else {
+		/* remove from session list */
  		list_del_init(&cap->session_caps);
  		session->s_nr_caps--;
  		cap->session = NULL;
@@ -1072,6 +1082,7 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
  	}
  	/* protect backpointer with s_cap_lock: see iterate_session_caps */
  	cap->ci = NULL;
+	mdsc = ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
  
  	/*
  	 * s_cap_reconnect is protected by s_cap_lock. no one changes