Re: [PATCH] xfs: Fix double unlock in defer capture code

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 5 Nov 2021 09:18:18 +1100

On Thu, Nov 04, 2021 at 09:59:50AM -0700, Allison Henderson wrote:
> On 11/3/21 6:30 PM, Darrick J. Wong wrote:
> > On Thu, Nov 04, 2021 at 11:16:33AM +1100, Dave Chinner wrote:
> > > On Wed, Nov 03, 2021 at 02:33:09PM -0700, Allison Henderson wrote:
> > > > @@ -777,15 +805,25 @@ xfs_defer_ops_continue(
> > > >   	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
> > > >   	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
> > > > -	/* Lock and join the captured inode to the new transaction. */
> > > > +	/* Lock the captured resources to the new transaction. */
> > > >   	if (dfc->dfc_held.dr_inos == 2)
> > > >   		xfs_lock_two_inodes(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL,
> > > >   				    dfc->dfc_held.dr_ip[1], XFS_ILOCK_EXCL);
> > > >   	else if (dfc->dfc_held.dr_inos == 1)
> > > >   		xfs_ilock(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL);
> > > > +
> > > > +	xfs_defer_relock_buffers(dfc);
> > > > +
> > > > +	/* Join the captured resources to the new transaction. */
> > > >   	xfs_defer_restore_resources(tp, &dfc->dfc_held);
> > > >   	memcpy(dres, &dfc->dfc_held, sizeof(struct xfs_defer_resources));
> > > > +	/*
> > > > +	 * Inodes must be passed back to the log recovery code to be unlocked,
> > > > +	 * but buffers do not.  Ignore the captured buffers
> > > > +	 */
> > > > +	dres->dr_bufs = 0;
> > > 
> > > I'm not sure what this comment is supposed to indicate. This seems
> > > to be infrastructure specific to log recovery, not general runtime
> > > functionality, but even in that context I don't really understand
> > > what it means or why it is done...
> > 
> > The defer_capture machinery picks up inodes that were ijoined with
> > lock_flags==0 (i.e. caller will unlock explicitly), which is why they
> > have to be passed back out after the entire transaction sequence
> > completes.

I'm still not grokking what "passed back out" is supposed to mean
or how it is implemented.

> > By contrast, the defer capture machinery picks up buffers with BLI_HOLD
> > set on the log item.  These are only meant to maintain the hold across
> > the next transaction roll (or the next defer_finish invocation), which
> > means that the caller is responsible for unlocking and releasing the
> > buffer (or I guess re-holding it) during that next transaction.

Sure, but buffers that have XFS_BLI_HOLD is set are not unlocked on
transaction commit. So this makes little sense to me.

A bunch of notes follows as I tried to make sense of this....

We have deferop "save/restore" resources functions that store held
buffers/inodes on save and hold them again on restore via a struct
xfs_defer_resources. This is only used to wrap transaction commits
in xfs_defer_trans_roll(), which means that the held objects stay
held across the entire transaction commit and defer ops processing.

Then we have "capture/free/continue/rele" which use the same struct
xfs_defer_resources but only takes direct references to buffers and
inodes they "hold" and rather than transaction scope references.
Hence before commit, they have to be relocked and rejoined to the
transaction. Ugh - same xfs_defer_resources structure, different
semantic meaning of contents.

Uses xfs_defer_restore_resources() internally, so it joins *and
holds* those items at the transaction level, meaning they do not get
unlocked by the subsequent transaction commit.  And then it is
committed like so:

                xfs_defer_ops_continue(dfc, tp, &dres);
		error = xfs_trans_commit(tp);
		xfs_defer_resources_rele(&dres);

And then because the objects are held and not unlocked by the
transaction commit, they need to be unlocked and released by the
xfs_defer_resources_rele() call.  But we've hacked dres.nbufs = 0,
so buffers are not released after transaction commit. This makes no
obvious sense - transaction commit does not free/release held
buffers, nor does xfs_defer_resources_rele(), so this just looks
like a buffer leak to me.

[ the API is a mess here - why does xfs_defer_ops_continue() memcpy
dfc->dres to dres, then get freed, then dres get passed to
xfs_defer_resources_rele()? Why isn't this simply:

		xfs_defer_ops_capture_continue(dfc, tp);
		error = xfs_trans_commit(tp);
		xfs_defer_ops_capture_rele(dfc);

The deferops functions are all single caller functions from log
recovery, so it doesn't make a huge amount of sense to me how or why
the code is structured this way. Indeed, I don't know why this
capture interface isn't part of the log recovery API, not core
deferops... ]

> Ok, so should we remove or expand the comment?  I thought it made sense with
> the commentary at the top of the function that talks about why inodes are
> passed back, but I am not picky.  How about:
> 
> /*
>  * Inodes must be passed back to the log recovery code to be unlocked,
>  * but buffers do not.  Buffers are released by the calling code, and
>  * only need to be transferred to the next transaction.  Ignore
>  * captured buffers here
>  */

This still just describes what the code does, and so I have no
more insight into what is actually doing the releasing of these
buffers and why they behave differently to inodes....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx