Re: [PATCH v3 07/21] fs: xfs: align args->minlen for forced allocation alignment

John Garry <john.g.garry@xxxxxxxxxx> · Thu, 6 Jun 2024 17:22:19 +0100

On 06/06/2024 09:47, Dave Chinner wrote:
On Wed, Jun 05, 2024 at 03:26:11PM +0100, John Garry wrote:
Hi Dave,

I still think that there is a problem with this code or some other allocator
code which gives rise to unexpected -ENOSPC. I just highlight this code,
above, as I get an unexpected -ENOSPC failure here when the fs does have
many free (big enough) extents. I think that the problem may be elsewhere,
though.

Initially we have a file like this:

  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
    0: [0..127]:        62592..62719      0 (62592..62719)     128
    1: [128..895]:      hole                                   768
    2: [896..1023]:     63616..63743      0 (63616..63743)     128
    3: [1024..1151]:    64896..65023      0 (64896..65023)     128
    4: [1152..1279]:    65664..65791      0 (65664..65791)     128
    5: [1280..1407]:    68224..68351      0 (68224..68351)     128
    6: [1408..1535]:    76416..76543      0 (76416..76543)     128
    7: [1536..1791]:    62720..62975      0 (62720..62975)     256
    8: [1792..1919]:    60032..60159      0 (60032..60159)     128
    9: [1920..2047]:    63488..63615      0 (63488..63615)     128
   10: [2048..2303]:    63744..63999      0 (63744..63999)     256

forcealign extsize is 16 4k fsb, so the layout looks ok.

Then we truncate the file to 454 sectors (or 56.75 fsb). This gives:

EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
    0: [0..127]:        62592..62719      0 (62592..62719)     128
    1: [128..455]:      hole                                   328

We have 57 fsb.

Then I attempt to write from byte offset 232448 (454 sector) and a get a
write failure in xfs_bmap_select_minlen() returning -ENOSPC; at that point
the file looks like this:

So you are doing an unaligned write of some size at EOF and EOF is
not aligned to the extsize?

Correct

  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
    0: [0..127]:        62592..62719      0 (62592..62719)     128
    1: [128..447]:      hole                                   320
    2: [448..575]:      62720..62847      0 (62720..62847)     128

That hole in ext #1 is 40 fsb, and not aligned with forcealign granularity.
This means that ext #2 is misaligned wrt forcealign granularity.

OK, so the command to produce this would be something like this?

# xfs_io -fd -c "truncate 0" \
	-c "chattr +<forcealign>" -c "extsize 64k" \
	-c "pwrite 0 64k -b 64k" -c "pwrite 448k 64k -b 64k" \
	-c "bmap -vvp" \
	-c "truncate 227k" \
	-c "bmap -vvp" \
	-c "pwrite 227k 64k -b 64k" \
	-c "bmap -vvp" \
	/mnt/scratch/testfile

No, unfortunately not. Well maybe not on a clean filesystem. In my 
stress test, something else is causing this. Probably heavy fragmentation.

This is strange.

I notice that we when allocate ext #2, xfs_bmap_btalloc() returns
ap->blkno=7840, length=16, offset=56. I would expect offset % 16 == 0, which
it is not.

IOWs, the allocation was not correctly rounded down to an aligned
start offset.  What were the initial parameters passed to this
allocation?

For xfs_bmap_btalloc() entry,

ap->offset=48, length=32, blkno=0, total=0, minlen=1, minleft=1, eof=1, 
wasdel=0, aeof=0, conv=0, datatype=5, flags=0x8

i.e. why didn't it round the start offset down to 48?
Answering that question will tell you where the bug is.

After xfs_bmap_compute_alignments() -> xfs_bmap_extsize_align(), 
ap->offset=48 - that seems ok.

Maybe the problem is in xfs_bmap_process_allocated_extent(). For the 
problematic case when calling that function:

args->fsbno=7840 args->len=16 ap->offset=48 orig_offset=56 orig_length=24

So, as the comment reads there, we could not satisfy the original length 
request, so we move up the position of the extent.

I assume that we just don't want to do that for forcealign, correct?

Of course, if the allocation start is rounded down to 48, then
the length should be rounded up to 32 to cover the entire range we
are writing new data to.

In the following sub-io block zeroing, I note that we zero the front padding
from pos=196608 (or fsb 48 or sector 384) for len=35840, and back padding
from pos=263680 for len=64000 (upto sector 640 or fsb 80). That seems wrong,
as we are zeroing data in the ext #1 hole, right?

The sub block zeroing is doing exactly the right thing - it is
demonstrating the exact range that the force aligned allocation
should have covered.

Agreed

Now the actual -ENOSPC comes from xfs_bmap_btalloc() -> ... ->
xfs_bmap_select_minlen() with initially blen=32 args->alignment=16
ap->minlen=1 args->maxlen=8. There xfs_bmap_btalloc() has ap->length=8
initially. This may be just a symptom.

Yeah, now the allocator is trying to fix up the mess that the first unaligned
allocation created, and it's tripping over ENOSPC because it's not
allowed to do sub-extent size hint allocations when forced alignment
is enabled....

I guess that there is something wrong in the block allocator for ext #2. Any
idea where to check?

Start with tracing exactly what range iomap is requesting be
allocated, and then follow that through into the allocator to work
out why the offset being passed to the allocation never gets rounded
down to be aligned. There's a mistake in the logic somewhere that is
failing to apply the start alignment to the allocation request (i.e.
the bug will be in the allocation setup code path. i.e. somewhere
in the xfs_bmapi_write -> xfs_bmap_btalloc path *before* we get to
the xfs_alloc_vextent...() calls.

As above, the problem seems in the processing fix-up.

Thanks,
John