Re: [PATCH] ceph: always clone dentry name when building MDS request

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 17, 2019 at 9:05 AM Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>
> On Wed, Apr 17, 2019 at 7:48 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> >
> > On Tue, Apr 16, 2019 at 10:14 PM Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> > >
> > > On Tue, Apr 16, 2019 at 10:18 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Apr 16, 2019 at 7:10 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Mon, Apr 15, 2019 at 10:24 PM Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Tue, Apr 16, 2019 at 12:47 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > Ben reported tripping the BUG_ON in create_request_message during some
> > > > > > > performance testing. Analysis of the vmcore showed that the length of
> > > > > > > the r_dentry->d_name string changed after we allocated the buffer, but
> > > > > > > before we encoded it.
> > > > > > >
> > > > > > > build_dentry_path returns pointers to d_name in the common case of
> > > > > > > non-snapped dentries, but this optimization isn't safe. Instead, make a
> > > > > > > copy of the string in this case. That is less efficient, but safe.
> > > > > > >
> > > > > > > Reported-by: Ben England <bengland@xxxxxxxxxx>
> > > > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > > > > > ---
> > > > > > >  fs/ceph/mds_client.c | 40 ++++++++++++++++++++++++++++++++--------
> > > > > > >  1 file changed, 32 insertions(+), 8 deletions(-)
> > > > > > >
> > > > > > > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > > > > > > index e09feadd2ae1..4cfefe118128 100644
> > > > > > > --- a/fs/ceph/mds_client.c
> > > > > > > +++ b/fs/ceph/mds_client.c
> > > > > > > @@ -2159,9 +2159,35 @@ char *ceph_mdsc_build_path(struct dentry *dentry, int *plen, u64 *base,
> > > > > > >         return path;
> > > > > > >  }
> > > > > > >
> > > > > > > +/* Duplicate the dentry->d_name.name safely */
> > > > > > > +static int clone_dentry_name(struct dentry *dentry, const char **ppath,
> > > > > > > +                            int *ppathlen)
> > > > > > > +{
> > > > > > > +       u32 len;
> > > > > > > +       char *name;
> > > > > > > +retry:
> > > > > > > +       len = READ_ONCE(dentry->d_name.len);
> > > > > > > +       name = kmalloc(len + 1, GFP_NOFS);
> > > > > > > +       if (!name)
> > > > > > > +               return -ENOMEM;
> > > > > > > +
> > > > > > > +       spin_lock(&dentry->d_lock);
> > > > > > > +       if (dentry->d_name.len != len) {
> > > > > > > +               spin_unlock(&dentry->d_lock);
> > > > > > > +               kfree(name);
> > > > > > > +               goto retry;
> > > > > > > +       }
> > > > > > > +       memcpy(name, dentry->d_name.name, len);
> > > > > > > +       spin_unlock(&dentry->d_lock);
> > > > > > > +
> > > > > > > +       name[len] = '\0';
> > > > > > > +       *ppath = name;
> > > > > > > +       *ppathlen = len;
> > > > > > > +       return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static int build_dentry_path(struct dentry *dentry, struct inode *dir,
> > > > > > > -                            const char **ppath, int *ppathlen, u64 *pino,
> > > > > > > -                            int *pfreepath)
> > > > > > > +                            const char **ppath, int *ppathlen, u64 *pino)
> > > > > > >  {
> > > > > > >         char *path;
> > > > > > >
> > > > > > > @@ -2171,16 +2197,13 @@ static int build_dentry_path(struct dentry *dentry, struct inode *dir,
> > > > > > >         if (dir && ceph_snap(dir) == CEPH_NOSNAP) {
> > > > > > >                 *pino = ceph_ino(dir);
> > > > > > >                 rcu_read_unlock();
> > > > > > > -               *ppath = dentry->d_name.name;
> > > > > > > -               *ppathlen = dentry->d_name.len;
> > > > > > > -               return 0;
> > > > > > > +               return clone_dentry_name(dentry, ppath, ppathlen);
> > > > > > >         }
> > > > > > >         rcu_read_unlock();
> > > > > > >         path = ceph_mdsc_build_path(dentry, ppathlen, pino, 1);
> > > > > > >         if (IS_ERR(path))
> > > > > > >                 return PTR_ERR(path);
> > > > > > >         *ppath = path;
> > > > > > > -       *pfreepath = 1;
> > > > > > >         return 0;
> > > > > > >  }
> > > > > > >
> > > > > > > @@ -2222,8 +2245,9 @@ static int set_request_path_attr(struct inode *rinode, struct dentry *rdentry,
> > > > > > >                 dout(" inode %p %llx.%llx\n", rinode, ceph_ino(rinode),
> > > > > > >                      ceph_snap(rinode));
> > > > > > >         } else if (rdentry) {
> > > > > > > -               r = build_dentry_path(rdentry, rdiri, ppath, pathlen, ino,
> > > > > > > -                                       freepath);
> > > > > > > +               r = build_dentry_path(rdentry, rdiri, ppath, pathlen, ino);
> > > > > > > +               if (!r)
> > > > > > > +                       *freepath = 1;
> > > > > > >                 dout(" dentry %p %llx/%.*s\n", rdentry, *ino, *pathlen,
> > > > > > >                      *ppath);
> > > > > > >         } else if (rpath || rino) {
> > > > > > > --
> > > > > > > 2.20.1
> > > > > > >
> > > > > >
> > > > > > Looks good. I think we also need to update ceph_fill_trace(), make it
> > > > > > handle that case rinfo->dname != req->r_dentry->d_name.
> > > > > >
> > > > >
> > > > > Good catch. I'll send a separate patch for that one.
> > > > > --
> > > > > Jeff Layton <jlayton@xxxxxxxxxx>
> > > >
> > > >
> > > > So to make sure I understand, the worry here is that we can end up
> > > > getting back an rinfo->dname that doesn't match what we have in
> > > > r_dentry from the initial call?
> > > >
> > > > That is quite a different problem from what this patch fixes if so.
> > > > What should we do in the case where that occurs?
> > >
> > > keep r_dentry untouched and return a error?  ceph_d_revalidate()
> > > return -ECHILD after it gets the error.
> > >
> >
> > We wouldn't do a synchronous MDS call in rcuwalk mode, so returning
> > -ECHILD would probably not be what we'd want to do. We could have this
> > happen in any reply (not just d_revalidate), in principle, so I think
> > we need a general solution here.
> >
>
> why it can happen in any reply? I think it can happen only when
> i_mutex of parent dir is not locked.
>

Could we not race against operations being done by other clients?

AFAIU, an MDS could set a dentry in the trace whenever it likes, and
it doesn't necessarily know anything about what r_dentry we have
stashed (other than the info that gets encoded into the request).

>
> > I think you're probably right that we should not touch r_dentry in
> > that case though. I'll look at grafting that logic into this.
> >
> > In the meantime, I'd like to see this patch and the one I sent
> > yesterday in v5.1 and stable kernels. Did you want to add your
> > Reviewed-by?
>
> Sorry for the delay
>

No problem. More important to get it right than fast.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux