Re: [RFC PATCH] ceph: guard against __ceph_remove_cap races

Jeff Layton <jlayton@xxxxxxxxxx> · Sun, 15 Dec 2019 17:40:21 -0500

On Sat, 2019-12-14 at 10:46 +0800, Yan, Zheng wrote:
> On Fri, Dec 13, 2019 at 1:32 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > I believe it's possible that we could end up with racing calls to
> > __ceph_remove_cap for the same cap. If that happens, the cap->ci
> > pointer will be zereoed out and we can hit a NULL pointer dereference.
> > 
> > Once we acquire the s_cap_lock, check for the ci pointer being NULL,
> > and just return without doing anything if it is.
> > 
> > URL: https://tracker.ceph.com/issues/43272
> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > ---
> >  fs/ceph/caps.c | 21 ++++++++++++++++-----
> >  1 file changed, 16 insertions(+), 5 deletions(-)
> > 
> > This is the only scenario that made sense to me in light of Ilya's
> > analysis on the tracker above. I could be off here though -- the locking
> > around this code is horrifically complex, and I could be missing what
> > should guard against this scenario.
> > 
> 
> I think the simpler fix is,  in trim_caps_cb, check if cap-ci is
> non-null before calling __ceph_remove_cap().  this should work because
> __ceph_remove_cap() is always called inside i_ceph_lock
> 

Is that sufficient though? The stack trace in the bug shows it being
called by ceph_trim_caps, but I think we could hit the same problem with
other __ceph_remove_cap callers, if they happen to race in at the right
time.

> > Thoughts?
> > 
> > diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
> > index 9d09bb53c1ab..7e39ee8eff60 100644
> > --- a/fs/ceph/caps.c
> > +++ b/fs/ceph/caps.c
> > @@ -1046,11 +1046,22 @@ static void drop_inode_snap_realm(struct ceph_inode_info *ci)
> >  void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
> >  {
> >         struct ceph_mds_session *session = cap->session;
> > -       struct ceph_inode_info *ci = cap->ci;
> > -       struct ceph_mds_client *mdsc =
> > -               ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
> > +       struct ceph_inode_info *ci;
> > +       struct ceph_mds_client *mdsc;
> >         int removed = 0;
> > 
> > +       spin_lock(&session->s_cap_lock);
> > +       ci = cap->ci;
> > +       if (!ci) {
> > +               /*
> > +                * Did we race with a competing __ceph_remove_cap call? If
> > +                * ci is zeroed out, then just unlock and don't do anything.
> > +                * Assume that it's on its way out anyway.
> > +                */
> > +               spin_unlock(&session->s_cap_lock);
> > +               return;
> > +       }
> > +
> >         dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode);
> > 
> >         /* remove from inode's cap rbtree, and clear auth cap */
> > @@ -1058,13 +1069,12 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
> >         if (ci->i_auth_cap == cap)
> >                 ci->i_auth_cap = NULL;
> > 
> > -       /* remove from session list */
> > -       spin_lock(&session->s_cap_lock);
> >         if (session->s_cap_iterator == cap) {
> >                 /* not yet, we are iterating over this very cap */
> >                 dout("__ceph_remove_cap  delaying %p removal from session %p\n",
> >                      cap, cap->session);
> >         } else {
> > +               /* remove from session list */
> >                 list_del_init(&cap->session_caps);
> >                 session->s_nr_caps--;
> >                 cap->session = NULL;
> > @@ -1072,6 +1082,7 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
> >         }
> >         /* protect backpointer with s_cap_lock: see iterate_session_caps */
> >         cap->ci = NULL;
> > +       mdsc = ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
> > 
> >         /*
> >          * s_cap_reconnect is protected by s_cap_lock. no one changes
> > --
> > 2.23.0
> > 

-- 
Jeff Layton <jlayton@xxxxxxxxxx>