Re: [RFC - PATCH] xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of the first attribute

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 07, 2017 at 01:26:09PM -0400, Brian Foster wrote:
> On Mon, Aug 07, 2017 at 09:37:44AM -0700, Darrick J. Wong wrote:
> > On Mon, Aug 07, 2017 at 11:16:54AM -0400, Brian Foster wrote:
> > > On Mon, Aug 07, 2017 at 03:54:03PM +0300, Alex Lyakas wrote:
> > > > The new attribute leaf buffer is not held locked across the transaction roll
> > > > between the shortform->leaf modification and the addition of the new entry.
> > > > As a result, the attribute buffer modification being made is not atomic
> > > > from an operational perspective.
> > > > Hence the AIL push can grab it in the transient state of "just created"
> > > > after the initial transaction is rolled, because the buffer has been
> > > > released.
> > > > This leads to xfs_attr3_leaf_verify() asserting that hdr.count is zero,
> > > > treating this as in-memory corruption, and shutting down the filesystem.
> > > > 
> > > > More info at:
> > > > https://www.spinics.net/lists/linux-xfs/msg08778.html
> > > > 
> ...
> > 
> > > > fs/xfs/libxfs/xfs_attr.c      | 24 +++++++++++++++++++++---
> > > > fs/xfs/libxfs/xfs_attr_leaf.c |  5 ++++-
> > > > fs/xfs/libxfs/xfs_attr_leaf.h |  2 +-
> > > > 3 files changed, 26 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > > index 353fb42..c0ced12 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > > +++ b/fs/xfs/libxfs/xfs_attr.c
> ...
> > > > @@ -328,48 +329,65 @@ xfs_attr_set(
> ...
> > > >         if (error) {
> > > >             ASSERT(committed);
> > > > +            if (leaf_bp)
> > > > +                xfs_buf_relse(leaf_bp);
> > > 
> > > Hmm, I don't think this is right. We don't want to release the buffer
> > > while the transaction still has a reference. Perhaps we should move this
> > > to the "out:" patch after the transaction cancel (and make sure the
> > > pointer is reset at the appropriate place)..?
> > 
> > I would've thought that the buffer would be relse'd by xfs_trans_cancel,
> > having been bjoined to the first args.trans and bheld to any subsequent
> > transactions?

(Ugh, I can't follow myself either. Sorry for the confusing question.)

> I'm not following what you mean. I guess if the transaction has been
> committed by xfs_bmap_finish(), then the buffer is actually not in the
> current transaction, but I think we need to release it either way (so
> perhaps this code is fine).

I'm confused and going to step through through this code one piece at a time.

	/*
	 * It won't fit in the shortform, transform to a leaf block.
	 * GROT: another possible req'mt for a double-split btree op.
	 */
	xfs_defer_init(args.dfops, args.firstblock);
	error = xfs_attr_shortform_to_leaf(&args, &leaf_bp);

	/* hold the leaf buffer locked, when "args.trans" transaction commits */
	if (leaf_bp)
		xfs_trans_bhold(args.trans, leaf_bp);

If we hit an error in xfs_attr_shortform_to_leaf leaf_bp will have been
bjoined to args.trans by xfs_attr_shortform_to_leaf ->
xfs_attr3_leaf_create -> xfs_da_get_buf -> xfs_trans_get_buf_map, so
when we goto out in the error handling clause below, we'll trans_cancel,
which releases leaf_bp, right?  And we can't just xfs_buf_relse because
that would unlock a buffer that's still owned by args.trans (like Brian
said)...

	if (!error)
		error = xfs_defer_finish(&args.trans, args.dfops, dp);

...but if we hit an error here, we'll have committed the transaction but
held on to the locked buffer, so we /do/ need to xfs_buf_relse...

	if (error) {
		args.trans = NULL;
		xfs_defer_cancel(&dfops);
		goto out;
	}

...in which case we no longer can use this error handling clause for
both cases because we have differing ownership of leaf_bp.  In effect we
need something structured like this, I think?

	xfs_defer_init()
	error = xfs_attr_shortform_to_leaf(...);
	if (error) {
		xfs_defer_cancel(...);
		goto out;
	}
	if (leaf_bp)
		xfs_trans_bhold(args.trans, leaf_bp);

	error = xfs_defer_finish(...);
	if (error) {
		xfs_defer_cancel(...);
		if (leaf_bp)
			xfs_brelse(leaf_bp);
		goto out;
	}

...and then when we're ready to add the new attr:

	if (leaf_bp)
		xfs_trans_bjoin(args.trans, leaf_bp);
	error = xfs_attr_*_addname(...);


> 
> > > 
> > > >             args.trans = NULL;
> > > >             xfs_bmap_cancel(&flist);
> > > >             goto out;
> > > >         }
> > > > 
> > > >         /*
> > > >          * bmap_finish() may have committed the last trans and started
> > > >          * a new one.  We need the inode to be in all transactions.
> > > > +         * We also need to rejoin the leaf buffer to the new transaction
> > > > +         * and prevent it from being unlocked, before we commit it in
> > > > xfs_trans_roll().
> > > > +         * If bmap_finish() did not commit, then we are in the same
> > > > transaction,
> > > > +         * and no need to call xfs_trans_bhold() again.
> > > >          */
> > > > -        if (committed)
> > > > +        if (committed) {
> > > >             xfs_trans_ijoin(args.trans, dp, 0);
> > > > +            xfs_trans_bjoin(args.trans, leaf_bp);
> > > > +            xfs_trans_bhold(args.trans, leaf_bp);
> > > 
> > > I don't see why this is necessary. We still have the buffer held and
> > > locked, yes?
> > 
> > The bhold means that after the transaction roll leaf_bp is still locked
> > but not associated with a transaction, so it is necessary to bjoin
> > leaf_bp to the new transaction.  Since the very next thing we do is roll
> > the transaction again, we also want to bhold it for the second roll.
> > 
> 
> Ok, so we've created the empty block, logged it in the first
> transaction, marked it held and committed the tx. At that point the
> buffer can make it to the AIL, but we still have it locked (and held) so
> it cannot be written back. What difference does it make whether the
> buffer is added to this transaction? It won't be dirtied in the
> transaction, it can't be written back either way and we eventually join
> it to the transaction that creates the attr (after the roll).

Yes, you're right, leaf_bp is held by us and not the transaction, and we
don't need to touch leaf_bp for this empty roll, so we neither need to
bjoin it nor bhold it here.

> > (Question: is the second roll even necessary for this case?)
> > 
> 
> It's probably moot because I believe this code goes away in recent
> kernels as the committed thing was cleaned out, probably before we
> converted this over to xfs_defer_finish().

It's still there, though it's probably moot and benign.

> ...
> > > > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > index b7cd0a0..2e03c32 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr_leaf.c
> > > > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
> ...
> > > > @@ -733,20 +734,22 @@ xfs_attr_shortform_to_leaf(xfs_da_args_t *args)
> > > >         ASSERT(error == -ENOATTR);
> > > >         error = xfs_attr3_leaf_add(bp, &nargs);
> > > >         ASSERT(error != -ENOSPC);
> > > >         if (error)
> > > >             goto out;
> > > >         sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
> > > >     }
> > > >     error = 0;
> > > > 
> > > > out:
> > > > +    if (error == 0)
> > > > +        *bpp = bp;
> > > 
> > > It looks like the only way error == 0 is where it is set just above, so
> > > we could probably move this before the label.
> > > 
> > > I'm also wondering whether it might be cleaner to 1.) conditionally
> > > return the buffer when sf->hdr.count == 0 because that is the only case
> > > where the problem occurs and 2.) do the trans_bhold() here in
> > > shortform_to_leaf() when we do actually return it.
> > 
> > I /think/ you could get rid of that second roll and do the bhold in
> > shortform_to_leaf, though you'd still have to re-bjoin it after
> > defer_finish.
> > 
> 
> Yeah, the semantic is basically to return a buffer iff the callee held
> it in the tx because it was empty (maybe reneame leaf_bp to empty_bp or
> something more clear?), thereby making the caller responsible to release
> it. The caller happens to do that by joining it to the transaction that
> will add the attr.

<nod>

--D

> 
> Brian
> 
> > --D
> > 
> > > 
> > > Brian
> > > 
> > > >     kmem_free(tmpbuffer);
> > > >     return error;
> > > > }
> > > > 
> > > > /*
> > > >  * Check a leaf attribute block to see if all the entries would fit into
> > > >  * a shortform attribute list.
> > > >  */
> > > > int
> > > > xfs_attr_shortform_allfit(
> > > > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.h b/fs/xfs/libxfs/xfs_attr_leaf.h
> > > > index 4f3a60a..d58e9e4 100644
> > > > --- a/fs/xfs/libxfs/xfs_attr_leaf.h
> > > > +++ b/fs/xfs/libxfs/xfs_attr_leaf.h
> > > > @@ -41,21 +41,21 @@ typedef struct xfs_attr_inactive_list {
> > > >  * Function prototypes for the kernel.
> > > >  *========================================================================*/
> > > > 
> > > > /*
> > > >  * Internal routines when attribute fork size < XFS_LITINO(mp).
> > > >  */
> > > > void    xfs_attr_shortform_create(struct xfs_da_args *args);
> > > > void    xfs_attr_shortform_add(struct xfs_da_args *args, int forkoff);
> > > > int    xfs_attr_shortform_lookup(struct xfs_da_args *args);
> > > > int    xfs_attr_shortform_getvalue(struct xfs_da_args *args);
> > > > -int    xfs_attr_shortform_to_leaf(struct xfs_da_args *args);
> > > > +int    xfs_attr_shortform_to_leaf(struct xfs_da_args *args, xfs_buf_t
> > > > **bpp);
> > > > int    xfs_attr_shortform_remove(struct xfs_da_args *args);
> > > > int    xfs_attr_shortform_list(struct xfs_attr_list_context *context);
> > > > int    xfs_attr_shortform_allfit(struct xfs_buf *bp, struct xfs_inode *dp);
> > > > int    xfs_attr_shortform_bytesfit(xfs_inode_t *dp, int bytes);
> > > > void    xfs_attr_fork_remove(struct xfs_inode *ip, struct xfs_trans *tp);
> > > > 
> > > > /*
> > > >  * Internal routines when attribute fork size == XFS_LBSIZE(mp).
> > > >  */
> > > > int    xfs_attr3_leaf_to_node(struct xfs_da_args *args);
> > > > -- 
> > > > 1.9.1
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux