Re: [PATCH] xfs: fix indlen accounting error on partial delalloc conversion

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 03, 2017 at 02:46:26PM -0400, Brian Foster wrote:
> The delalloc -> real block conversion path uses an incorrect
> calculation in the case where the middle part of a delalloc extent
> is being converted. This is documented as a rare situation because
> XFS generally attempts to maximize contiguity by converting as much
> of a delalloc extent as possible.
> 
> If this situation does occur, the indlen reservation for the two new
> delalloc extents left behind by the conversion of the middle range
> is calculated and compared with the original reservation. If more
> blocks are required, the delta is allocated from the global block
> pool. This delta value can be characterized as the difference
> between the new total requirement (temp + temp2) and the currently
> available reservation minus those blocks that have already been
> allocated (startblockval(PREV.br_startblock) - allocated).
> 
> The problem is that the current code does not account for previously
> allocated blocks correctly. It subtracts the current allocation
> count from the (new - old) delta rather than the old indlen
> reservation. This means that more indlen blocks than have been
> allocated end up stashed in the remaining extents and free space
> accounting is broken as a result.
> 
> Fix up the calculation to subtract the allocated block count from
> the original extent indlen and thus correctly allocate the
> reservation delta based on the difference between the new total
> requirement and the unused blocks from the original reservation.
> Also remove a bogus assert that contradicts the fact that the new
> indlen reservation can be larger than the original indlen
> reservation.
> 
> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> ---
> 
> Hi all,
> 
> So while I fortunately was able to reproduce this problem and verify the
> patch, I'm not able to do so in any way that facilitates a generic test
> case. To reproduce, I had to hack up the allocation path to always do
> single physical block allocations and otherwise not attempt to convert
> whole delalloc extents, hack up an ioctl to do a
> filemap_write_and_wait_range(), and run a test that repeatedly does 1M
> buffered writes followed by fragmented, page-sized writebacks of that 1M
> extent. This forces partial (middle) extent conversions until the extent
> tree grows enough to require a bmbt block allocation. When that occurs,
> I always end up with the following free space accounting inconsistency
> reported via xfs_repair:
> 
> Phase 1 - find and verify superblock...     
> Phase 2 - using internal log
>         - zero log...               
>         - scan filesystem freespace and inode maps...
> sb_fdblocks 3929316, counted 3929314             
> ...
> 
> With this patch applied, the same test no longer reproduces the
> corruption. This otherwise survives my xfstests testing (without the
> aforementioned hacks). Thoughts, reviews, flames appreciated.
> 
> Brian
> 
>  fs/xfs/libxfs/xfs_bmap.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index f02eb76..8adb91b 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -2065,8 +2065,10 @@ xfs_bmap_add_extent_delay_real(
>  		}
>  		temp = xfs_bmap_worst_indlen(bma->ip, temp);
>  		temp2 = xfs_bmap_worst_indlen(bma->ip, temp2);
> -		diff = (int)(temp + temp2 - startblockval(PREV.br_startblock) -
> -			(bma->cur ? bma->cur->bc_private.b.allocated : 0));
> +		diff = (int)(temp + temp2 -
> +			     (startblockval(PREV.br_startblock) -
> +			      (bma->cur ?
> +			       bma->cur->bc_private.b.allocated : 0)));

<thinking aloud again>

Prior to this operation we made a delalloc reservation in which we
reserved a number of blocks for bmbt expansion by subtracting that
quantity from incore fdblocks; that is startblockval(PREV.br_startblock).
Then we did some bmbt magic which used up cur->bc_private.b.allocated
blocks from the reservation.  Since those blocks were already subtracted
from the incore fdblocks, we don't need to do that now.  After the magic
was over, we now have two smaller delalloc reservations (with a real
extent in the middle), each with their own reservations for bmbt shape
changing expansion.

Now we aim to make the accounting of incore fdblocks square again by
checking if we are now hanging on to more blocks than we were before and
subtracting the difference from fdblocks if so.  (We deal separately with
the possibility of hanging onto /fewer/ blocks than before.)

Ok, so first we reserved startblockval(PREV.br_startblock).

After splitting the da extent in the middle we now have two da extents;
the effective indlen reservation of the whole range we just touched is:
xfs_bmap_worst_indlen(bma->ip, temp) + xfs_bmap_worst_indlen(bma->ip, temp2);

Then we (may have) fed bma->cur->bc_private.b.allocated blocks to the bmbt.
The number of blocks we now know about is:

xfs_bmap_worst_indlen(bma->ip, temp) +
xfs_bmap_worst_indlen(bma->ip, temp2) +
bma->cur->bc_private.b.allocated

Therefore, the number of blocks we are over (if at all) is:

xfs_bmap_worst_indlen(bma->ip, temp) +
xfs_bmap_worst_indlen(bma->ip, temp2) +
bma->cur->bc_private.b.allocated -
startblockval(PREV.br_startblock)

Or, put another way:

(xfs_bmap_worst_indlen(bma->ip, temp) +
 xfs_bmap_worst_indlen(bma->ip, temp2)) -
(startblockval(PREV.br_startblock) -
 bma->cur->bc_private.b.allocated)

(Exactly what Brian wrote above.)

If that quantity is positive, we're now hanging on to more blocks than
we were before, so we subtract that many from fdblocks.  That part looks
ok, though I have one question -- why is diff an int?  temp/temp2 are
both xfs_filblks_t (__uint64_t), so the whole expression is promoted to
a 64-bit operation, then the results are squashed down into a 32-bit
int, which is then cast back to int64_t.  With our short MAXEXTLEN I
think it's impossible ever to have diff > 2^32, but the casting back and
forth caught my eye.

Aside from that, it looks ok...
Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>

--D

>  		if (diff > 0) {
>  			error = xfs_mod_fdblocks(bma->ip->i_mount,
>  						 -((int64_t)diff), false);
> @@ -2123,7 +2125,6 @@ xfs_bmap_add_extent_delay_real(
>  		temp = da_new;
>  		if (bma->cur)
>  			temp += bma->cur->bc_private.b.allocated;
> -		ASSERT(temp <= da_old);
>  		if (temp < da_old)
>  			xfs_mod_fdblocks(bma->ip->i_mount,
>  					(int64_t)(da_old - temp), false);
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux