Re: [PATCH 0.1/13] xfs: release new dquot buffer on defer_finish error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 04, 2018 at 12:03:56PM -0400, Brian Foster wrote:
> On Fri, May 04, 2018 at 08:52:36AM -0700, Darrick J. Wong wrote:
> > On Fri, May 04, 2018 at 11:41:21AM -0400, Brian Foster wrote:
> > > On Fri, May 04, 2018 at 08:12:47AM -0700, Darrick J. Wong wrote:
> > > > On Fri, May 04, 2018 at 07:31:58AM -0400, Brian Foster wrote:
> > > > > On Thu, May 03, 2018 at 10:53:43AM -0700, Darrick J. Wong wrote:
> > > > > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > > > 
> > > > > > In commit efa092f3d4c6 "[XFS] Fixes a bug in the quota code when
> > > > > > allocating a new dquot record", we allocate a new dquot block, grab a
> > > > > > buffer to initialize it, and return the locked initialized dquot buffer
> > > > > > to the caller for further in-core dquot initialization.  Unfortunately,
> > > > > > if the _bmap_finish errored out, _qm_dqalloc would also error out
> > > > > > without bothering to free the (locked) buffer.  Leaking a locked buffer
> > > > > > caused hangs in generic/388 when quotas are enabled.
> > > > > > 
> > > > > > Furthermore, the _bmap_finish -> _defer_finish conversion in
> > > > > > 310a75a3c6c747 ("xfs: change xfs_bmap_{finish,cancel,init,free} ->
> > > > > > xfs_defer_*") failed to observe that the buffer was held going into
> > > > > > _defer_finish and therefore failed to notice that the buffer lock is
> > > > > > /not/ maintained afterwards.  Now that we can bjoin a buffer to a
> > > > > > defer_ops, use this mechanism to ensure that the buffer stays locked
> > > > > > across the _defer_finish.  Release the holds and locks on the buffer as
> > > > > > appropriate if we have to error out.
> > > > > > 
> > > > > > There is a subtlety here for the caller in that the buffer emerges
> > > > > > locked and held to the transaction, so if the _trans_commit fails we
> > > > > > have to release the buffer explicitly.  This fixes the unmount hang
> > > > > > in generic/388 when quotas are enabled.
> > > > > > 
> > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > > > > > ---
> > > > > >  fs/xfs/xfs_dquot.c |   48 +++++++++++++++++++++++++++---------------------
> > > > > >  1 file changed, 27 insertions(+), 21 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > > > > > index a7daef9e16bf..4c39d8632230 100644
> > > > > > --- a/fs/xfs/xfs_dquot.c
> > > > > > +++ b/fs/xfs/xfs_dquot.c
> > > > > > @@ -362,33 +362,39 @@ xfs_qm_dqalloc(
> > > > > >  			      dqp->dq_flags & XFS_DQ_ALLTYPES, bp);
> > > > > >  
> > > > > >  	/*
> > > > > > -	 * xfs_defer_finish() may commit the current transaction and
> > > > > > -	 * start a second transaction if the freelist is not empty.
> > > > > > +	 * Hold the buffer and join it to the dfops so that we'll still own
> > > > > > +	 * the buffer when we return to the caller.  The buffer disposal on
> > > > > > +	 * error must be paid attention to very carefully, as it has been
> > > > > > +	 * broken since commit efa092f3d4c6 "[XFS] Fixes a bug in the quota
> > > > > > +	 * code when allocating a new dquot record" in 2005, and the later
> > > > > > +	 * conversion to xfs_defer_ops in commit 310a75a3c6c747 failed to keep
> > > > > > +	 * the buffer locked across the _defer_finish call.  We can now do
> > > > > > +	 * this correctly with xfs_defer_bjoin.
> > > > > >  	 *
> > > > > > -	 * Since we still want to modify this buffer, we need to
> > > > > > -	 * ensure that the buffer is not released on commit of
> > > > > > -	 * the first transaction and ensure the buffer is added to the
> > > > > > -	 * second transaction.
> > > > > > +	 * Above, we allocated a disk block for the dquot information and
> > > > > > +	 * used get_buf to initialize the dquot.  If the _defer_bjoin fails,
> > > > > > +	 * the buffer is still locked to *tpp, so we must _bhold_release and
> > > > > > +	 * then _trans_brelse the buffer.  If the _defer_finish fails, the old
> > > > > > +	 * transaction is gone but the new buffer is not joined or held to any
> > > > > > +	 * transaction, so we must _buf_relse it.
> > > > > >  	 *
> > > > > > -	 * If there is only one transaction then don't stop the buffer
> > > > > > -	 * from being released when it commits later on.
> > > > > > +	 * If everything succeeds, the caller of this function is returned a
> > > > > > +	 * buffer that is locked, held, and joined to the transaction.  If the
> > > > > > +	 * transaction commit fails (in the caller) the caller must unlock the
> > > > > > +	 * buffer manually.
> > > > > 
> > > > > If the buffer is held due to the xfs_defer_bjoin(), doesn't that mean
> > > > > that the caller has to ultimately release it even after successful
> > > > > transaction commit (assuming we don't roll the transaction again
> > > > > somewhere)? I see we have an xfs_trans_brelse() up in xfs_qm_dqread(),
> > > > > but it looks like that only clears the hold if the buffer isn't logged
> > > > > in the tx. Hm?
> > > > 
> > > > Correct.  The buffer is initialized in the same transaction as the dquot
> > > > block allocation and committed in xfs_defer_finish.  After
> > > > initialization (which is to say when we return to xfs_qm_dqtobp), the
> > > > buffer is held, joined, and not logged to the transaction, and nothing
> > > > else is supposed to dirty the buffer.  Both buffer and transaction are
> > > > then returned in this state to _dqread, which the in-core dquot state
> > > > out of the dquot buffer and _trans_brelse's the (still clean) buffer,
> > > > which breaks the hold and unlocks the buffer.
> > > > 
> > > 
> > > Ok that makes sense, but doesn't that depend on having a deferred
> > > operation? Is that always guaranteed here?
> > 
> > Assuming you meant the case where we _trans_read_buf'd the dquot buffer
> > in from disk, we return a buffer that's clean, locked, and joined to the
> > transaction.  The only difference is that the buffer isn't held, but
> > _trans_brelse clears the hold unconditionally.
> > 
> 
> I'm referring to the xfs_qm_dqalloc() case. It looks like we
> xfs_bmapi_write(), get the buffer, call xfs_qm_init_dquot_blk() (logs
> the buffer) then go into the defer/exit sequence modified by this
> patch...
> 
> If the defer finish doesn't do anything, are we in the right state or is
> the buffer still dirty+held? If the latter, doesn't that mean the buffer
> remains held after the caller commits the current transaction?

Ah, right, now I see what you're getting at.

If the defer_finish does nothing then yes we do exit up to _dqread with
the buffer locked, dirty, joined, and held to the transaction, in which
case the xfs_trans_brelse does nothing.  The transaction commit right
after that will log the buffer and release the hold, but the buffer's
still locked.

So I guess we need to roll the transaction after the defer_finish to
guarantee that the buffer is no longer dirty.  Potentially we could
modify _defer_finish to always roll at least once if any of the
_defer_[bi]join items are dirty so that we always return with the
_defer_*join'd items clean?

For now it's probably fine to _bhold_release it after the _defer_finish
(as Brian just suggested on irc) and reconsider "_defer_finish with
defer_joined dirty stuff" separately.

The refactor fixes all this so that we always commit the transaction
before handing back a bjoin'd buffer, so the state is consistent.

--D

> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > > > After the refactor we guarantee that the buffer is locked, clean, and
> > > > not attached to a transaction by the time we get to calling
> > > > xfs_dquot_from_disk rather than returning transaction and buffer up the
> > > > call stack and having to reason up the stack about what state they're in.
> > > > 
> > > > --D
> > > > 
> > > > > Brian
> > > > > 
> > > > > >  	 */
> > > > > > -
> > > > > > -	xfs_trans_bhold(tp, bp);
> > > > > > -
> > > > > > +	xfs_trans_bhold(*tpp, bp);
> > > > > > +	error = xfs_defer_bjoin(&dfops, bp);
> > > > > > +	if (error) {
> > > > > > +		xfs_trans_bhold_release(*tpp, bp);
> > > > > > +		xfs_trans_brelse(*tpp, bp);
> > > > > > +		goto error1;
> > > > > > +	}
> > > > > >  	error = xfs_defer_finish(tpp, &dfops);
> > > > > > -	if (error)
> > > > > > +	if (error) {
> > > > > > +		xfs_buf_relse(bp);
> > > > > >  		goto error1;
> > > > > > -
> > > > > > -	/* Transaction was committed? */
> > > > > > -	if (*tpp != tp) {
> > > > > > -		tp = *tpp;
> > > > > > -		xfs_trans_bjoin(tp, bp);
> > > > > > -	} else {
> > > > > > -		xfs_trans_bhold_release(tp, bp);
> > > > > >  	}
> > > > > > -
> > > > > > -	*O_bpp = bp;
> > > > > >  	return 0;
> > > > > >  
> > > > > >  error1:
> > > > > > --
> > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux