Re: [PATCH v4 00/14] forcealign for xfs

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 9 Sep 2024 08:49:44 +1000

On Fri, Sep 06, 2024 at 03:31:43PM +0100, John Garry wrote:
> On 05/09/2024 22:47, Dave Chinner wrote:
> > > > >   If the start or end of the extent which needs unmapping is
> > > > >        unaligned then we convert that extent to unwritten and skip,
> > > > >        is it? (__xfs_bunmapi())
> > > > The high level code should be aligning the start and end of the
> > > > file range to be removed via xfs_inode_alloc_unitsize().
> > > Is that the case for something like truncate? There we just say what is the
> > > end block which we want to truncate to in
> > > xfs_itruncate_extents_flags(new_size)  ->
> > > xfs_bunmapi_range(XFS_B_TO_FSB(new_size)), and that may not be alloc unit
> > > aligned.
> > Ah, I thought we had that alignment in xfs_itruncate_extents_flags()
> > already, but if we don't then that's a bug that needs to be fixed.
> 
> AFAICS, forcealign behaviour is same as RT, so then this would be a mainline
> bug, right?

No, I don't think so. I think this is a case where forcealign and RT
behaviours differ, primarily because RT doesn't actually care about
file offset -> physical offset alignment and can do unaligned IO
whenever it wants. Hence having an unaligned written->unwritten
extent state transition doesnt' affect anything for rt files...

> 
> > > We change the space reservation in xfs-setattr_size() for this case
> > (patch 9) but then don't do any alignment there - it relies on
> > xfs_itruncate_extents_flags() to do the right thing w.r.t. extent
> > removal alignment w.r.t. the new EOF.
> > 
> > i.e. The xfs_setattr_size() code takes care of EOF block zeroing and
> > page cache removal so the user doesn't see old data beyond EOF,
> > whilst xfs_itruncate_extents_flags() is supposed to take care of the
> > extent removal and the details of that operation (e.g. alignment).
> 
> So we should roundup the unmap block to the alloc unit, correct? I have my
> doubts about that, and thought that xfs_bunmapi_range() takes care of any
> alignment handling.

The start should round up, yes. And, no, xfs_bunmapi_range() isn't
specifically "taking care" of alignment. It's just marking a partial
alloc unit range as unwritten, which means we now have -unaligned
extents- in the BMBT. 

> 
> > 
> > Patch 10 also modifies xfs_can_free_eofblocks() to take alignment
> > into account for the post-eof block removal, but doesn't change
> > xfs_free_eofblocks() at all. i.e  it also relies on
> > xfs_itruncate_extents_flags() to do the right thing for force
> > aligned inodes.
> 
> What state should the blocks post-EOF blocks be? A simple example of
> partially truncating an alloc unit is:
> 
> $xfs_io -c "extsize" mnt/file
> [16384] mnt/file
> 
> 
> $xfs_bmap -vvp mnt/file
> mnt/file:
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>    0: [0..20479]:      192..20671        0 (192..20671)     20480 000000
> 
> 
> $truncate -s 10461184 mnt/file # 10M - 6FSB
> 
> $xfs_bmap -vvp mnt/file
> mnt/file:
>  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
>    0: [0..20431]:      192..20623        0 (192..20623)     20432 000000
>    1: [20432..20447]:  20624..20639      0 (20624..20639)      16 010000
>  FLAG Values:
>     0010000 Unwritten preallocated extent
> 
> Is that incorrect state?

Think about it: what happens if you now truncate it back up to 10MB
(i.e. aligned length) and then do an aligned atomic write on it.

First: What happens when you truncate up?

......

Yes, iomap_zero_range() will see the unwritten extent and skip it.
i.e. The unwritten extent stays as an unwritten extent, it's now
within EOF. That written->unwritten extent boundary is not on an
aligned file offset.

Second: What happens when you do a correctly aligned atomic write
that spans this range now?

......

Iomap only maps a single extent at a time, so it will only map the
written range from the start of the IO (aligned) to the start of the
unwritten extent (unaligned).  Hence the atomic write will be
rejected because we can't do the atomic write to such an unaligned
extent.

That's not a bug in the atomic write path - this failure occurs
because of the truncate behaviour doing post-eof unwritten extent
conversion....

Yes, I agree that the entire -physical- extent is still correctly
aligned on disk so you could argue that the unwritten conversion
that xfs_bunmapi_range is doing is valid forced alignment behaviour.
However, the fact is that breaking the aligned physical extent into
two unaligned contiguous extents in different states in the BMBT
means that they are treated as two seperate unaligned extents, not
one contiguous aligned physical extent.

This extent state discontiunity is results in breaking physical IO
across the extent state boundary. Hence such an unaligned extent
state change violates the physical IO alignment guarantees that
forced alignment is supposed to provide atomic writes...

This is the reason why operations like hole punching round to extent
size for forced alignment at the highest level. i.e. We cannot allow
xfs_bunmapi_range() to "take care" of alignment handling because
managing partial extent unmappings with sub-alloc-unit unwritten
extents does not work for forced alignment.

Again, this is different to the traditional RT file behaviour - it
can use unwritten extents for sub-alloc-unit alignment unmaps
because the RT device can align file offset to any physical offset,
and issue unaligned sector sized IO without any restrictions. Forced
alignment does not have this freedom, and when we extend forced
alignment to RT files, it will not have the freedom to use
unwritten extents for sub-alloc-unit unmapping, either.

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx