Re: [RFC - PATCH] xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of the first attribute

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Brian,

Thanks for the review.

On Mon, Aug 7, 2017 at 6:16 PM, Brian Foster <bfoster@xxxxxxxxxx> wrote:
> On Mon, Aug 07, 2017 at 03:54:03PM +0300, Alex Lyakas wrote:
>> The new attribute leaf buffer is not held locked across the transaction roll
>> between the shortform->leaf modification and the addition of the new entry.
>> As a result, the attribute buffer modification being made is not atomic
>> from an operational perspective.
>> Hence the AIL push can grab it in the transient state of "just created"
>> after the initial transaction is rolled, because the buffer has been
>> released.
>> This leads to xfs_attr3_leaf_verify() asserting that hdr.count is zero,
>> treating this as in-memory corruption, and shutting down the filesystem.
>>
>> More info at:
>> https://www.spinics.net/lists/linux-xfs/msg08778.html
>>
>
> FYI.. I'm not sure how appropriate it is to use URLs in commit log
> descriptions. Perhaps somebody else can chime in on that.
I will get rid of it.

>
>> This patch is based on kernel 3.18.19, which is a long-term (although EOL)
>> kernel.
>> This is the reason it is marked as RFC.
>>
>
> It's probably best to develop against the latest kernel, get the fix
> right, then worry about v3.18 for a backport.
Unfortunately, for us this is exactly the opposite. We are running
kernel 3.18 in production, and that's where we need the fix. Moving to
a different kernel is a significant effort for us. We do it from time
to time, but it's always to a long-term stable kernel and not to one
of the latests. Still, below I provide a patch for 4.13-rc4.

>
>> Reproducing the issue is achieved by adding "msleep(1000)" after the
>> xfs_trans_roll() call.
>> From the limited testing, this patch seems to fix the issue.
>> Once/if the community approves this patch, we will do a formal testing.
>>
>> Signed-off-by: Alex Lyakas <alex@xxxxxxxxxxxxxxxxx>
>> ---
>
> Note that your patch doesn't apply for me:
>
> ...
> patching file fs/xfs/libxfs/xfs_attr.c
> Hunk #1 FAILED at 285.
> patch: **** malformed patch at line 70: commits */
>
> Perhaps it has been damaged by your mailer? Otherwise, a few initial
> comments...
Trying a different mailer now.

>
>> fs/xfs/libxfs/xfs_attr.c      | 24 +++++++++++++++++++++---
>> fs/xfs/libxfs/xfs_attr_leaf.c |  5 ++++-
>> fs/xfs/libxfs/xfs_attr_leaf.h |  2 +-
>> 3 files changed, 26 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
>> index 353fb42..c0ced12 100644
>> --- a/fs/xfs/libxfs/xfs_attr.c
>> +++ b/fs/xfs/libxfs/xfs_attr.c
>> @@ -285,20 +285,21 @@ xfs_attr_set(
>>
>>     xfs_trans_ijoin(args.trans, dp, 0);
>>
>>     /*
>>      * If the attribute list is non-existent or a shortform list,
>>      * upgrade it to a single-leaf-block attribute list.
>>      */
>>     if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL ||
>>         (dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS &&
>>          dp->i_d.di_anextents == 0)) {
>> +        xfs_buf_t *leaf_bp = NULL;
>
> Use struct xfs_buf here and throughout. We're attempting to remove uses
> of the old typedefs wherever possible.
Noted.

>
>>
>>         /*
>>          * Build initial attribute list (if required).
>>          */
>>         if (dp->i_d.di_aformat == XFS_DINODE_FMT_EXTENTS)
>>             xfs_attr_shortform_create(&args);
>>
>>         /*
>>          * Try to add the attr to the attribute list in
>>          * the inode.
>> @@ -328,48 +329,65 @@ xfs_attr_set(
>>             xfs_iunlock(dp, XFS_ILOCK_EXCL);
>>
>>             return error ? error : err2;
>>         }
>>
>>         /*
>>          * It won't fit in the shortform, transform to a leaf block.
>>          * GROT: another possible req'mt for a double-split btree op.
>>          */
>>         xfs_bmap_init(args.flist, args.firstblock);
>> -        error = xfs_attr_shortform_to_leaf(&args);
>> +        error = xfs_attr_shortform_to_leaf(&args, &leaf_bp);
>> +        /* hold the leaf buffer locked, when "args.trans" transaction
>> commits */
>> +        if (leaf_bp)
>> +            xfs_trans_bhold(args.trans, leaf_bp);
>
> Indentation looks off here and throughout.
>
>> +
>>         if (!error) {
>>             error = xfs_bmap_finish(&args.trans, args.flist,
>>                         &committed);
>>         }
>>         if (error) {
>>             ASSERT(committed);
>> +            if (leaf_bp)
>> +                xfs_buf_relse(leaf_bp);
>
> Hmm, I don't think this is right. We don't want to release the buffer
> while the transaction still has a reference. Perhaps we should move this
> to the "out:" patch after the transaction cancel (and make sure the
> pointer is reset at the appropriate place)..?
I think I understand what you are saying. After we call
xfs_trans_bhold(), we need first the original transaction to commit or
to abort. Then we must either rejoin the buffer to a different
transaction or release it. I believe the below patch addresses this
issue.

>
>>             args.trans = NULL;
>>             xfs_bmap_cancel(&flist);
>>             goto out;
>>         }
>>
>>         /*
>>          * bmap_finish() may have committed the last trans and started
>>          * a new one.  We need the inode to be in all transactions.
>> +         * We also need to rejoin the leaf buffer to the new transaction
>> +         * and prevent it from being unlocked, before we commit it in
>> xfs_trans_roll().
>> +         * If bmap_finish() did not commit, then we are in the same
>> transaction,
>> +         * and no need to call xfs_trans_bhold() again.
>>          */
>> -        if (committed)
>> +        if (committed) {
>>             xfs_trans_ijoin(args.trans, dp, 0);
>> +            xfs_trans_bjoin(args.trans, leaf_bp);
>> +            xfs_trans_bhold(args.trans, leaf_bp);
>
> I don't see why this is necessary. We still have the buffer held and
> locked, yes?
I believe you are right. We don't care if there was another
transaction in-between. Addressed this.

>
>> +        }
>>
>>         /*
>>          * Commit the leaf transformation.  We'll need another (linked)
>>          * transaction to add the new attribute to the leaf.
>>          */
>>
>>         error = xfs_trans_roll(&args.trans, dp);
>> -        if (error)
>> +        if (error) {
>> +            xfs_buf_relse(leaf_bp);
>
> Same problem as above.
>
>>             goto out;
>> +        }
>>
>> +        /* rejoin the leaf buffer to the new transaction */
>> +        xfs_trans_bjoin(args.trans, leaf_bp);
>
> This comment should point out that this allows a subsequent read to find
> the buffer in the transaction (and avoid a deadlock).
Done.

>
>>     }
>>
>>     if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
>>         error = xfs_attr_leaf_addname(&args);
>>     else
>>         error = xfs_attr_node_addname(&args);
>>     if (error)
>>         goto out;
>>
>>     /*
>> diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
>> index b7cd0a0..2e03c32 100644
>> --- a/fs/xfs/libxfs/xfs_attr_leaf.c
>> +++ b/fs/xfs/libxfs/xfs_attr_leaf.c
>> @@ -649,23 +649,24 @@ xfs_attr_shortform_getvalue(xfs_da_args_t *args)
>>         args->valuelen = sfe->valuelen;
>>         memcpy(args->value, &sfe->nameval[args->namelen],
>>                             args->valuelen);
>>         return -EEXIST;
>>     }
>>     return -ENOATTR;
>> }
>>
>> /*
>>  * Convert from using the shortform to the leaf.
>> + * Upon success, return the leaf buffer.
>>  */
>> int
>> -xfs_attr_shortform_to_leaf(xfs_da_args_t *args)
>> +xfs_attr_shortform_to_leaf(xfs_da_args_t *args, xfs_buf_t **bpp)
>> {
>>     xfs_inode_t *dp;
>>     xfs_attr_shortform_t *sf;
>>     xfs_attr_sf_entry_t *sfe;
>>     xfs_da_args_t nargs;
>>     char *tmpbuffer;
>>     int error, i, size;
>>     xfs_dablk_t blkno;
>>     struct xfs_buf *bp;
>>     xfs_ifork_t *ifp;
>> @@ -733,20 +734,22 @@ xfs_attr_shortform_to_leaf(xfs_da_args_t *args)
>>         ASSERT(error == -ENOATTR);
>>         error = xfs_attr3_leaf_add(bp, &nargs);
>>         ASSERT(error != -ENOSPC);
>>         if (error)
>>             goto out;
>>         sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
>>     }
>>     error = 0;
>>
>> out:
>> +    if (error == 0)
>> +        *bpp = bp;
>
> It looks like the only way error == 0 is where it is set just above, so
> we could probably move this before the label.
I am a bit confused by this code in xfs_attr_shortform_to_leaf():
    ...
    error = xfs_attr3_leaf_create(args, blkno, &bp);
    if (error) {
        error = xfs_da_shrink_inode(args, 0, bp);
        bp = NULL;
        if (error)
            goto out;
        xfs_idata_realloc(dp, size, XFS_ATTR_FORK);    /* try to put */
        memcpy(ifp->if_u1.if_data, tmpbuffer, size);    /* it back */
        goto out;
    }
Here we fail to convert to a leaf. But if xfs_da_shrink_inode()
succeeds, we return error==0 back to the caller. But we actually
failed to convert. This, however, is not directly related to your
comment.

>
> I'm also wondering whether it might be cleaner to 1.) conditionally
> return the buffer when sf->hdr.count == 0 because that is the only case
> where the problem occurs and 2.) do the trans_bhold() here in
> shortform_to_leaf() when we do actually return it.
>
> Brian
>
>>     kmem_free(tmpbuffer);
>>     return error;
>> }
>>

Here is the patch based on kernel 4.13-rc4.

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index de7b9bd..982e322 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -218,6 +218,7 @@
     xfs_fsblock_t        firstblock;
     int            rsvd = (flags & ATTR_ROOT) != 0;
     int            error, err2, local;
+    struct xfs_buf        *leaf_bp = NULL;

     XFS_STATS_INC(mp, xs_attr_set);

@@ -327,7 +328,13 @@
          * GROT: another possible req'mt for a double-split btree op.
          */
         xfs_defer_init(args.dfops, args.firstblock);
-        error = xfs_attr_shortform_to_leaf(&args);
+        error = xfs_attr_shortform_to_leaf(&args, &leaf_bp);
+        /*
+         * Prevent the leaf buffer from being unlocked
+         * when "args.trans" transaction commits.
+         */
+        if (leaf_bp)
+            xfs_trans_bhold(args.trans, leaf_bp);
         if (!error)
             error = xfs_defer_finish(&args.trans, args.dfops, dp);
         if (error) {
@@ -345,6 +352,14 @@
         if (error)
             goto out;

+        /*
+         * Rejoin the leaf buffer to the new transaction.
+         * This allows a subsequent read to find the buffer in the
+         * transaction (and avoid a deadlock).
+         */
+        xfs_trans_bjoin(args.trans, leaf_bp);
+        /* Prevent from being released at the end of the function */
+        leaf_bp = NULL;
     }

     if (xfs_bmap_one_block(dp, XFS_ATTR_FORK))
@@ -376,6 +391,8 @@
 out:
     if (args.trans)
         xfs_trans_cancel(args.trans);
+    if (leaf_bp)
+        xfs_buf_relse(leaf_bp);
     xfs_iunlock(dp, XFS_ILOCK_EXCL);
     return error;
 }
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index c6c15e5..ab73e4b 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -740,9 +740,10 @@ STATIC void xfs_attr3_leaf_moveents(struct
xfs_da_args *args,

 /*
  * Convert from using the shortform to the leaf.
+ * Upon success, return the leaf buffer.
  */
 int
-xfs_attr_shortform_to_leaf(xfs_da_args_t *args)
+xfs_attr_shortform_to_leaf(xfs_da_args_t *args, struct xfs_buf **bpp)
 {
     xfs_inode_t *dp;
     xfs_attr_shortform_t *sf;
@@ -822,6 +823,7 @@ STATIC void xfs_attr3_leaf_moveents(struct
xfs_da_args *args,
         sfe = XFS_ATTR_SF_NEXTENTRY(sfe);
     }
     error = 0;
+    *bpp = bp;

 out:
     kmem_free(tmpbuffer);
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.h b/fs/xfs/libxfs/xfs_attr_leaf.h
index f7dda0c..7382d4e 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.h
+++ b/fs/xfs/libxfs/xfs_attr_leaf.h
@@ -48,7 +48,7 @@
 void    xfs_attr_shortform_add(struct xfs_da_args *args, int forkoff);
 int    xfs_attr_shortform_lookup(struct xfs_da_args *args);
 int    xfs_attr_shortform_getvalue(struct xfs_da_args *args);
-int    xfs_attr_shortform_to_leaf(struct xfs_da_args *args);
+int    xfs_attr_shortform_to_leaf(struct xfs_da_args *args, struct
xfs_buf **bpp);
 int    xfs_attr_shortform_remove(struct xfs_da_args *args);
 int    xfs_attr_shortform_allfit(struct xfs_buf *bp, struct xfs_inode *dp);
 int    xfs_attr_shortform_bytesfit(struct xfs_inode *dp, int bytes);
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux