On Wed, 2018-10-24 at 14:24 +0800, Yan, Zheng wrote: > On Mon, Oct 15, 2018 at 9:05 PM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > On Mon, 2018-10-15 at 20:37 +0800, Yan, Zheng wrote: > > > > On Oct 15, 2018, at 19:29, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > > On Mon, 2018-10-15 at 09:58 +0800, Yan, Zheng wrote: > > > > > > On Oct 12, 2018, at 18:56, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > > > > > > On Fri, 2018-10-12 at 10:39 +0800, Yan, Zheng wrote: > > > > > > > > On Oct 11, 2018, at 22:49, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > On Fri, 2018-09-28 at 17:43 +0800, Yan, Zheng wrote: > > > > > > > > > This reverts commit 8b8f53af1ed9df88a4c0fbfdf3db58f62060edf3. > > > > > > > > > > > > > > > > > > splice_dentry() is used by three places. For two places, req->r_dentry > > > > > > > > > is passed to splice_dentry(). In the case of error, req->r_dentry does > > > > > > > > > not get updated. So splice_dentry() should not drop reference. > > > > > > > > > > > > > > > > > > Cc: stable@xxxxxxxxxxxxxxx #4.18 > > > > > > > > > Signed-off-by: "Yan, Zheng" <zyan@xxxxxxxxxx> > > > > > > > > > --- > > > > > > > > > fs/ceph/inode.c | 1 - > > > > > > > > > 1 file changed, 1 deletion(-) > > > > > > > > > > > > > > > > > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c > > > > > > > > > index c6bbb7aa99e4..375924b2bc86 100644 > > > > > > > > > --- a/fs/ceph/inode.c > > > > > > > > > +++ b/fs/ceph/inode.c > > > > > > > > > @@ -1140,7 +1140,6 @@ static struct dentry *splice_dentry(struct dentry *dn, struct inode *in) > > > > > > > > > if (IS_ERR(realdn)) { > > > > > > > > > pr_err("splice_dentry error %ld %p inode %p ino %llx.%llx\n", > > > > > > > > > PTR_ERR(realdn), dn, in, ceph_vinop(in)); > > > > > > > > > - dput(dn); > > > > > > > > > dn = realdn; /* note realdn contains the error */ > > > > > > > > > goto out; > > > > > > > > > } else if (realdn) { > > > > > > > > > > > > > > > > This might be ok, buI have some real concerns about splice_dentry and > > > > > > > > its callers -- particularly ceph_fill_trace: > > > > > > > > > > > > > > > > We hold a reference to dn on entry to splice_dentry. We then call > > > > > > > > d_splice_alias and get back an error, and now we don't put the old > > > > > > > > dentry. > > > > > > > > > > > > > > > > Fine -- we have to then expect the caller to do it. Unfortunately, the > > > > > > > > callers in ceph_fill_trace do this: > > > > > > > > > > > > > > > > dn = splice_dentry(dn, in); > > > > > > > > if (IS_ERR(dn)) { > > > > > > > > err = PTR_ERR(dn); > > > > > > > > goto done; > > > > > > > > } > > > > > > > > req->r_dentry = dn; /* may have spliced */ > > > > > > > > > > > > > > > > The old value of dn gets clobbered once that comes back with an ERR_PTR. > > > > > > > > I guess we could claim that r_dentry will still be set to the old value > > > > > > > > at that point and that it would get cleaned up when it gets cleaned up. > > > > > > > > > > > > > > > > But...I see this higher up in ceph_fill_trace at the end of the > > > > > > > > CEPH_MFS_OP_RENAME condition block: > > > > > > > > > > > > > > > > dn = req->r_old_dentry; /* use old_dentry */ > > > > > > > > > > > > > > > > So now I'm worried about the case where the splice succeeds. ISTM that > > > > > > > > "dn" can represent either r_dentry or r_old_dentry at the point where > > > > > > > > splice_dentry gets called, but we only ever reset the value of r_dentry > > > > > > > > there. > > > > > > > > > > > > > > > > If dn == r_old_dentry at the time that splice_dentry is called, and then > > > > > > > > that succeeds, we'll end up leaking the reference to r_dentry and then > > > > > > > > doing an overput on r_old_dentry. > > > > > > > > > > > > > > > > > > > > > > Good catch > > > > > > > > > > > > > > > I think it might help to establish clear "ownership" of the dentry > > > > > > > > references throughout that function. Consider zeroing out r_dentry and > > > > > > > > r_old_dentry at the time that you set the local variables? That might > > > > > > > > make this whole thing less fragile. > > > > > > > > > > > > > > Before the function return, we need to set req->r_dentry again. This will introduce duplicated code because there are several ‘goto’ in the function. How about following change. > > > > > > > > > > > > > > > > > > > > > - dn = req->r_old_dentry; /* use old_dentry */ > > > > > > > + /* swap r_dentry and r_old_dentry */ > > > > > > > + req->r_dentry = req->r_old_dentry; > > > > > > > + req->r_old_dentry = dn; > > > > > > > + dn = req->r_dentry; > > > > > > > > > > > > > > > > > > > Why do we need to reset r_dentry? I don't see where it gets used after > > > > > > ceph_fill_trace returns, so I gather we're just resetting it to clean up > > > > > > the references? Your proposed fix looks like it would fix that bug. It > > > > > > will mean that the values in the req will change but maybe that's ok. > > > > > > > > > > > > > > > > It’s used by ceph_finish_lookup > > > > > > > > > > > > > In that case, it might be bad idea to overwrite the value of r_dentry > > > > with the r_old_dentry pointer? > > > > > > ceph_finish_lookup() is not used in the rename case ;) > > > > > > > > > > Ok, but still...I worry about this sort of special-casing, as it can > > lead to people making incorrect assumptions later. I think it would be > > best to just take a dentry ref for "dn" and just put it before it goes > > out of scope. > > how about below patch. let splice_dentry() to update req->r_dentry. > > > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c > index e8ce4e66bc47..739737f4e0e1 100644 > --- a/fs/ceph/inode.c > +++ b/fs/ceph/inode.c > @@ -1106,8 +1106,9 @@ static void update_dentry_lease(struct dentry *dentry, > * splice a dentry to an inode. > * caller must hold directory i_mutex for this to be safe. > */ > -static struct dentry *splice_dentry(struct dentry *dn, struct inode *in) > +static int splice_dentry(struct dentry **pdn, struct inode *in) > { > + struct dentry *dn = *pdn; > struct dentry *realdn; > > BUG_ON(d_inode(dn)); > @@ -1140,28 +1141,23 @@ static struct dentry *splice_dentry(struct dentry *dn, struct inode *in) > if (IS_ERR(realdn)) { > pr_err("splice_dentry error %ld %p inode %p ino %llx.%llx\n", > PTR_ERR(realdn), dn, in, ceph_vinop(in)); > - dn = realdn; > - /* > - * Caller should release 'dn' in the case of error. > - * If 'req->r_dentry' is passed to this function, > - * caller should leave 'req->r_dentry' untouched. > - */ > - goto out; > - } else if (realdn) { > + return PTR_ERR(realdn); > + } > + > + if (realdn) { > dout("dn %p (%d) spliced with %p (%d) " > "inode %p ino %llx.%llx\n", > dn, d_count(dn), > realdn, d_count(realdn), > d_inode(realdn), ceph_vinop(d_inode(realdn))); > dput(dn); > - dn = realdn; > + *pdn = realdn; > } else { > BUG_ON(!ceph_dentry(dn)); > dout("dn %p attached to %p ino %llx.%llx\n", > dn, d_inode(dn), ceph_vinop(d_inode(dn))); > } > -out: > - return dn; > + return 0; > } > > /* > @@ -1348,7 +1344,10 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req) > dout("dn %p gets new offset %lld\n", req->r_old_dentry, > ceph_dentry(req->r_old_dentry)->offset); > > - dn = req->r_old_dentry; /* use old_dentry */ > + /* swap r_dentry and r_old_dentry */ > + req->r_dentry = req->r_old_dentry; > + req->r_old_dentry = dn; > + dn = req->r_dentry; I think the above deserves a comment explaining why this is safe to do for rename ops. > } > > /* null dentry? */ > @@ -1373,12 +1372,10 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req) > if (d_really_is_negative(dn)) { > ceph_dir_clear_ordered(dir); > ihold(in); > - dn = splice_dentry(dn, in); > - if (IS_ERR(dn)) { > - err = PTR_ERR(dn); > + err = splice_dentry(&req->r_dentry, in); > + if (err < 0) > goto done; > - } > - req->r_dentry = dn; /* may have spliced */ > + dn = req->r_dentry; /* may have spliced */ > } else if (d_really_is_positive(dn) && d_inode(dn) != in) { > dout(" %p links to %p %llx.%llx, not %llx.%llx\n", > dn, d_inode(dn), ceph_vinop(d_inode(dn)), > @@ -1398,22 +1395,18 @@ int ceph_fill_trace(struct super_block *sb, struct ceph_mds_request *req) > } else if ((req->r_op == CEPH_MDS_OP_LOOKUPSNAP || > req->r_op == CEPH_MDS_OP_MKSNAP) && > !test_bit(CEPH_MDS_R_ABORTED, &req->r_req_flags)) { > - struct dentry *dn = req->r_dentry; > struct inode *dir = req->r_parent; > > /* fill out a snapdir LOOKUPSNAP dentry */ > - BUG_ON(!dn); > BUG_ON(!dir); > BUG_ON(ceph_snap(dir) != CEPH_SNAPDIR); > - dout(" linking snapped dir %p to dn %p\n", in, dn); > + BUG_ON(!req->r_dentry); > + dout(" linking snapped dir %p to dn %p\n", in, req->r_dentry); > ceph_dir_clear_ordered(dir); > ihold(in); > - dn = splice_dentry(dn, in); > - if (IS_ERR(dn)) { > - err = PTR_ERR(dn); > + err = splice_dentry(&req->r_dentry, in); > + if (err < 0) > goto done; > - } > - req->r_dentry = dn; /* may have spliced */ > } else if (rinfo->head->is_dentry) { > struct ceph_vino *ptvino = NULL; > > @@ -1677,8 +1670,6 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req, > } > > if (d_really_is_negative(dn)) { > - struct dentry *realdn; > - > if (ceph_security_xattr_deadlock(in)) { > dout(" skip splicing dn %p to inode %p" > " (security xattr deadlock)\n", dn, in); > @@ -1687,13 +1678,9 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req, > goto next_item; > } > > - realdn = splice_dentry(dn, in); > - if (IS_ERR(realdn)) { > - err = PTR_ERR(realdn); > - d_drop(dn); > + err = splice_dentry(&dn, in); > + if (err < 0) > goto next_item; > - } > - dn = realdn; > } > > ceph_dentry(dn)->offset = rde->offset; > @@ -1709,8 +1696,7 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req, > err = ret; > } > next_item: > - if (dn) > - dput(dn); > + dput(dn); > } > out: > if (err == 0 && skipped == 0) { > The rest seems to be ok. > > > > > At the very least, if you are going to override r_dentry like this, > > please add a comment explaining why it's safe to do this in the rename case. > > > > > > > > > > > > > > > > > Another idea might be to just keep an extra reference to "dn", and not > > > > > > reset r_dentry and r_old_dentry? It's a bit more code, but it might be a > > > > > > cleaner fix, long-term. > > > > > > > > > > I will check, Thanks > > > > > > > > > > Yan, Zheng > > > > > > > > > > > -- > > > > > > Jeff Layton <jlayton@xxxxxxxxxx> > > > > > > > > > > > > > > > > > > -- > > > > Jeff Layton <jlayton@xxxxxxxxxx> > > > > > > > > > > > > > > -- > > Jeff Layton <jlayton@xxxxxxxxxx> > > -- Jeff Layton <jlayton@xxxxxxxxxx>