On Mon, Aug 07, 2017 at 08:00:46PM +0300, Alex Lyakas wrote: > Hello Brian, > > Thanks for the review. > > On Mon, Aug 7, 2017 at 6:16 PM, Brian Foster <bfoster@xxxxxxxxxx> wrote: > > On Mon, Aug 07, 2017 at 03:54:03PM +0300, Alex Lyakas wrote: > >> The new attribute leaf buffer is not held locked across the transaction roll > >> between the shortform->leaf modification and the addition of the new entry. > >> As a result, the attribute buffer modification being made is not atomic > >> from an operational perspective. > >> Hence the AIL push can grab it in the transient state of "just created" > >> after the initial transaction is rolled, because the buffer has been > >> released. > >> This leads to xfs_attr3_leaf_verify() asserting that hdr.count is zero, > >> treating this as in-memory corruption, and shutting down the filesystem. > >> > >> More info at: > >> https://www.spinics.net/lists/linux-xfs/msg08778.html > >> > > > > FYI.. I'm not sure how appropriate it is to use URLs in commit log > > descriptions. Perhaps somebody else can chime in on that. > I will get rid of it. > > > > >> This patch is based on kernel 3.18.19, which is a long-term (although EOL) > >> kernel. > >> This is the reason it is marked as RFC. > >> > > > > It's probably best to develop against the latest kernel, get the fix > > right, then worry about v3.18 for a backport. > Unfortunately, for us this is exactly the opposite. We are running > kernel 3.18 in production, and that's where we need the fix. Moving to > a different kernel is a significant effort for us. We do it from time > to time, but it's always to a long-term stable kernel and not to one > of the latests. Still, below I provide a patch for 4.13-rc4. > This is the same as for most people. Nobody is running the latest for-next kernel in a critical production environment. We debug/diagnose issues against older kernels all the time, fix in the latest development tree and backport from there as needed. Once the fix is backported and picked up in a stable kernel, you can update your production systems at that point. It sounds like you have a reliable reproducer (albeit with injection of an artificial delay), so you should be able to reproduce and test/verify against a development kernel. This doesn't require moving any of your infrastructure to the latest kernel (use a vm, for example). > > > >> Reproducing the issue is achieved by adding "msleep(1000)" after the > >> xfs_trans_roll() call. > >> From the limited testing, this patch seems to fix the issue. > >> Once/if the community approves this patch, we will do a formal testing. > >> > >> Signed-off-by: Alex Lyakas <alex@xxxxxxxxxxxxxxxxx> > >> --- > > > > Note that your patch doesn't apply for me: > > > > ... > > patching file fs/xfs/libxfs/xfs_attr.c > > Hunk #1 FAILED at 285. > > patch: **** malformed patch at line 70: commits */ > > > > Perhaps it has been damaged by your mailer? Otherwise, a few initial > > comments... > Trying a different mailer now. > Same problem. It looks like everything is converted to spaces. Are you copy+pasting the patch into an email? Note that usually will not work. You can use something like 'git send-email' to post correctly formatted patches over mail. Also note you can send it to yourself initially and test apply it (and/or run scripts/checkpatch.pl against it) to make sure it works. > > > >> fs/xfs/libxfs/xfs_attr.c | 24 +++++++++++++++++++++--- > >> fs/xfs/libxfs/xfs_attr_leaf.c | 5 ++++- > >> fs/xfs/libxfs/xfs_attr_leaf.h | 2 +- > >> 3 files changed, 26 insertions(+), 5 deletions(-) > >> > >> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c > >> index 353fb42..c0ced12 100644 > >> --- a/fs/xfs/libxfs/xfs_attr.c > >> +++ b/fs/xfs/libxfs/xfs_attr.c ... > >> @@ -328,48 +329,65 @@ xfs_attr_set( ... > > Hmm, I don't think this is right. We don't want to release the buffer > > while the transaction still has a reference. Perhaps we should move this > > to the "out:" patch after the transaction cancel (and make sure the > > pointer is reset at the appropriate place)..? > I think I understand what you are saying. After we call > xfs_trans_bhold(), we need first the original transaction to commit or > to abort. Then we must either rejoin the buffer to a different > transaction or release it. I believe the below patch addresses this > issue. > As noted in the last mail, it may not matter since the tx is committed or aborted by the time an error is returned here. That said, I still prefer the code you have below. :) ... > >> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c > >> index b7cd0a0..2e03c32 100644 > >> --- a/fs/xfs/libxfs/xfs_attr_leaf.c > >> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c > >> @@ -649,23 +649,24 @@ xfs_attr_shortform_getvalue(xfs_da_args_t *args) > >> args->valuelen = sfe->valuelen; > >> memcpy(args->value, &sfe->nameval[args->namelen], > >> args->valuelen); > >> return -EEXIST; > >> } > >> return -ENOATTR; > >> } > >> > >> /* > >> * Convert from using the shortform to the leaf. > >> + * Upon success, return the leaf buffer. > >> */ > >> int > >> -xfs_attr_shortform_to_leaf(xfs_da_args_t *args) > >> +xfs_attr_shortform_to_leaf(xfs_da_args_t *args, xfs_buf_t **bpp) > >> { > >> xfs_inode_t *dp; > >> xfs_attr_shortform_t *sf; > >> xfs_attr_sf_entry_t *sfe; > >> xfs_da_args_t nargs; > >> char *tmpbuffer; > >> int error, i, size; > >> xfs_dablk_t blkno; > >> struct xfs_buf *bp; > >> xfs_ifork_t *ifp; > >> @@ -733,20 +734,22 @@ xfs_attr_shortform_to_leaf(xfs_da_args_t *args) > >> ASSERT(error == -ENOATTR); > >> error = xfs_attr3_leaf_add(bp, &nargs); > >> ASSERT(error != -ENOSPC); > >> if (error) > >> goto out; > >> sfe = XFS_ATTR_SF_NEXTENTRY(sfe); > >> } > >> error = 0; > >> > >> out: > >> + if (error == 0) > >> + *bpp = bp; > > > > It looks like the only way error == 0 is where it is set just above, so > > we could probably move this before the label. > I am a bit confused by this code in xfs_attr_shortform_to_leaf(): > ... > error = xfs_attr3_leaf_create(args, blkno, &bp); > if (error) { > error = xfs_da_shrink_inode(args, 0, bp); > bp = NULL; > if (error) > goto out; > xfs_idata_realloc(dp, size, XFS_ATTR_FORK); /* try to put */ > memcpy(ifp->if_u1.if_data, tmpbuffer, size); /* it back */ > goto out; > } > Here we fail to convert to a leaf. But if xfs_da_shrink_inode() > succeeds, we return error==0 back to the caller. But we actually > failed to convert. This, however, is not directly related to your > comment. > Hm, I'm not aware of any reason to intentionally clear the error in that case. That may be a bug. Brian > > > > I'm also wondering whether it might be cleaner to 1.) conditionally > > return the buffer when sf->hdr.count == 0 because that is the only case > > where the problem occurs and 2.) do the trans_bhold() here in > > shortform_to_leaf() when we do actually return it. > > > > Brian > > > >> kmem_free(tmpbuffer); > >> return error; > >> } > >> > > Here is the patch based on kernel 4.13-rc4. > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c > index de7b9bd..982e322 100644 > --- a/fs/xfs/libxfs/xfs_attr.c > +++ b/fs/xfs/libxfs/xfs_attr.c > @@ -218,6 +218,7 @@ > xfs_fsblock_t firstblock; > int rsvd = (flags & ATTR_ROOT) != 0; > int error, err2, local; > + struct xfs_buf *leaf_bp = NULL; > > XFS_STATS_INC(mp, xs_attr_set); > > @@ -327,7 +328,13 @@ > * GROT: another possible req'mt for a double-split btree op. > */ > xfs_defer_init(args.dfops, args.firstblock); > - error = xfs_attr_shortform_to_leaf(&args); > + error = xfs_attr_shortform_to_leaf(&args, &leaf_bp); > + /* > + * Prevent the leaf buffer from being unlocked > + * when "args.trans" transaction commits. > + */ > + if (leaf_bp) > + xfs_trans_bhold(args.trans, leaf_bp); > if (!error) > error = xfs_defer_finish(&args.trans, args.dfops, dp); > if (error) { > @@ -345,6 +352,14 @@ > if (error) > goto out; > > + /* > + * Rejoin the leaf buffer to the new transaction. > + * This allows a subsequent read to find the buffer in the > + * transaction (and avoid a deadlock). > + */ > + xfs_trans_bjoin(args.trans, leaf_bp); > + /* Prevent from being released at the end of the function */ > + leaf_bp = NULL; > } > > if (xfs_bmap_one_block(dp, XFS_ATTR_FORK)) > @@ -376,6 +391,8 @@ > out: > if (args.trans) > xfs_trans_cancel(args.trans); > + if (leaf_bp) > + xfs_buf_relse(leaf_bp); > xfs_iunlock(dp, XFS_ILOCK_EXCL); > return error; > } > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c > index c6c15e5..ab73e4b 100644 > --- a/fs/xfs/libxfs/xfs_attr_leaf.c > +++ b/fs/xfs/libxfs/xfs_attr_leaf.c > @@ -740,9 +740,10 @@ STATIC void xfs_attr3_leaf_moveents(struct > xfs_da_args *args, > > /* > * Convert from using the shortform to the leaf. > + * Upon success, return the leaf buffer. > */ > int > -xfs_attr_shortform_to_leaf(xfs_da_args_t *args) > +xfs_attr_shortform_to_leaf(xfs_da_args_t *args, struct xfs_buf **bpp) > { > xfs_inode_t *dp; > xfs_attr_shortform_t *sf; > @@ -822,6 +823,7 @@ STATIC void xfs_attr3_leaf_moveents(struct > xfs_da_args *args, > sfe = XFS_ATTR_SF_NEXTENTRY(sfe); > } > error = 0; > + *bpp = bp; > > out: > kmem_free(tmpbuffer); > diff --git a/fs/xfs/libxfs/xfs_attr_leaf.h b/fs/xfs/libxfs/xfs_attr_leaf.h > index f7dda0c..7382d4e 100644 > --- a/fs/xfs/libxfs/xfs_attr_leaf.h > +++ b/fs/xfs/libxfs/xfs_attr_leaf.h > @@ -48,7 +48,7 @@ > void xfs_attr_shortform_add(struct xfs_da_args *args, int forkoff); > int xfs_attr_shortform_lookup(struct xfs_da_args *args); > int xfs_attr_shortform_getvalue(struct xfs_da_args *args); > -int xfs_attr_shortform_to_leaf(struct xfs_da_args *args); > +int xfs_attr_shortform_to_leaf(struct xfs_da_args *args, struct > xfs_buf **bpp); > int xfs_attr_shortform_remove(struct xfs_da_args *args); > int xfs_attr_shortform_allfit(struct xfs_buf *bp, struct xfs_inode *dp); > int xfs_attr_shortform_bytesfit(struct xfs_inode *dp, int bytes); > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html