On Sat, 2019-12-14 at 10:46 +0800, Yan, Zheng wrote: > On Fri, Dec 13, 2019 at 1:32 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > I believe it's possible that we could end up with racing calls to > > __ceph_remove_cap for the same cap. If that happens, the cap->ci > > pointer will be zereoed out and we can hit a NULL pointer dereference. > > > > Once we acquire the s_cap_lock, check for the ci pointer being NULL, > > and just return without doing anything if it is. > > > > URL: https://tracker.ceph.com/issues/43272 > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > --- > > fs/ceph/caps.c | 21 ++++++++++++++++----- > > 1 file changed, 16 insertions(+), 5 deletions(-) > > > > This is the only scenario that made sense to me in light of Ilya's > > analysis on the tracker above. I could be off here though -- the locking > > around this code is horrifically complex, and I could be missing what > > should guard against this scenario. > > > > I think the simpler fix is, in trim_caps_cb, check if cap-ci is > non-null before calling __ceph_remove_cap(). this should work because > __ceph_remove_cap() is always called inside i_ceph_lock > Is that sufficient though? The stack trace in the bug shows it being called by ceph_trim_caps, but I think we could hit the same problem with other __ceph_remove_cap callers, if they happen to race in at the right time. > > Thoughts? > > > > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c > > index 9d09bb53c1ab..7e39ee8eff60 100644 > > --- a/fs/ceph/caps.c > > +++ b/fs/ceph/caps.c > > @@ -1046,11 +1046,22 @@ static void drop_inode_snap_realm(struct ceph_inode_info *ci) > > void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release) > > { > > struct ceph_mds_session *session = cap->session; > > - struct ceph_inode_info *ci = cap->ci; > > - struct ceph_mds_client *mdsc = > > - ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; > > + struct ceph_inode_info *ci; > > + struct ceph_mds_client *mdsc; > > int removed = 0; > > > > + spin_lock(&session->s_cap_lock); > > + ci = cap->ci; > > + if (!ci) { > > + /* > > + * Did we race with a competing __ceph_remove_cap call? If > > + * ci is zeroed out, then just unlock and don't do anything. > > + * Assume that it's on its way out anyway. > > + */ > > + spin_unlock(&session->s_cap_lock); > > + return; > > + } > > + > > dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode); > > > > /* remove from inode's cap rbtree, and clear auth cap */ > > @@ -1058,13 +1069,12 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release) > > if (ci->i_auth_cap == cap) > > ci->i_auth_cap = NULL; > > > > - /* remove from session list */ > > - spin_lock(&session->s_cap_lock); > > if (session->s_cap_iterator == cap) { > > /* not yet, we are iterating over this very cap */ > > dout("__ceph_remove_cap delaying %p removal from session %p\n", > > cap, cap->session); > > } else { > > + /* remove from session list */ > > list_del_init(&cap->session_caps); > > session->s_nr_caps--; > > cap->session = NULL; > > @@ -1072,6 +1082,7 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release) > > } > > /* protect backpointer with s_cap_lock: see iterate_session_caps */ > > cap->ci = NULL; > > + mdsc = ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc; > > > > /* > > * s_cap_reconnect is protected by s_cap_lock. no one changes > > -- > > 2.23.0 > > -- Jeff Layton <jlayton@xxxxxxxxxx>