On Fri, Sep 06, 2024 at 03:31:43PM +0100, John Garry wrote: > On 05/09/2024 22:47, Dave Chinner wrote: > > > > > If the start or end of the extent which needs unmapping is > > > > > unaligned then we convert that extent to unwritten and skip, > > > > > is it? (__xfs_bunmapi()) > > > > The high level code should be aligning the start and end of the > > > > file range to be removed via xfs_inode_alloc_unitsize(). > > > Is that the case for something like truncate? There we just say what is the > > > end block which we want to truncate to in > > > xfs_itruncate_extents_flags(new_size) -> > > > xfs_bunmapi_range(XFS_B_TO_FSB(new_size)), and that may not be alloc unit > > > aligned. > > Ah, I thought we had that alignment in xfs_itruncate_extents_flags() > > already, but if we don't then that's a bug that needs to be fixed. > > AFAICS, forcealign behaviour is same as RT, so then this would be a mainline > bug, right? No, I don't think so. I think this is a case where forcealign and RT behaviours differ, primarily because RT doesn't actually care about file offset -> physical offset alignment and can do unaligned IO whenever it wants. Hence having an unaligned written->unwritten extent state transition doesnt' affect anything for rt files... > > > > We change the space reservation in xfs-setattr_size() for this case > > (patch 9) but then don't do any alignment there - it relies on > > xfs_itruncate_extents_flags() to do the right thing w.r.t. extent > > removal alignment w.r.t. the new EOF. > > > > i.e. The xfs_setattr_size() code takes care of EOF block zeroing and > > page cache removal so the user doesn't see old data beyond EOF, > > whilst xfs_itruncate_extents_flags() is supposed to take care of the > > extent removal and the details of that operation (e.g. alignment). > > So we should roundup the unmap block to the alloc unit, correct? I have my > doubts about that, and thought that xfs_bunmapi_range() takes care of any > alignment handling. The start should round up, yes. And, no, xfs_bunmapi_range() isn't specifically "taking care" of alignment. It's just marking a partial alloc unit range as unwritten, which means we now have -unaligned extents- in the BMBT. > > > > > Patch 10 also modifies xfs_can_free_eofblocks() to take alignment > > into account for the post-eof block removal, but doesn't change > > xfs_free_eofblocks() at all. i.e it also relies on > > xfs_itruncate_extents_flags() to do the right thing for force > > aligned inodes. > > What state should the blocks post-EOF blocks be? A simple example of > partially truncating an alloc unit is: > > $xfs_io -c "extsize" mnt/file > [16384] mnt/file > > > $xfs_bmap -vvp mnt/file > mnt/file: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..20479]: 192..20671 0 (192..20671) 20480 000000 > > > $truncate -s 10461184 mnt/file # 10M - 6FSB > > $xfs_bmap -vvp mnt/file > mnt/file: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..20431]: 192..20623 0 (192..20623) 20432 000000 > 1: [20432..20447]: 20624..20639 0 (20624..20639) 16 010000 > FLAG Values: > 0010000 Unwritten preallocated extent > > Is that incorrect state? Think about it: what happens if you now truncate it back up to 10MB (i.e. aligned length) and then do an aligned atomic write on it. First: What happens when you truncate up? ...... Yes, iomap_zero_range() will see the unwritten extent and skip it. i.e. The unwritten extent stays as an unwritten extent, it's now within EOF. That written->unwritten extent boundary is not on an aligned file offset. Second: What happens when you do a correctly aligned atomic write that spans this range now? ...... Iomap only maps a single extent at a time, so it will only map the written range from the start of the IO (aligned) to the start of the unwritten extent (unaligned). Hence the atomic write will be rejected because we can't do the atomic write to such an unaligned extent. That's not a bug in the atomic write path - this failure occurs because of the truncate behaviour doing post-eof unwritten extent conversion.... Yes, I agree that the entire -physical- extent is still correctly aligned on disk so you could argue that the unwritten conversion that xfs_bunmapi_range is doing is valid forced alignment behaviour. However, the fact is that breaking the aligned physical extent into two unaligned contiguous extents in different states in the BMBT means that they are treated as two seperate unaligned extents, not one contiguous aligned physical extent. This extent state discontiunity is results in breaking physical IO across the extent state boundary. Hence such an unaligned extent state change violates the physical IO alignment guarantees that forced alignment is supposed to provide atomic writes... This is the reason why operations like hole punching round to extent size for forced alignment at the highest level. i.e. We cannot allow xfs_bunmapi_range() to "take care" of alignment handling because managing partial extent unmappings with sub-alloc-unit unwritten extents does not work for forced alignment. Again, this is different to the traditional RT file behaviour - it can use unwritten extents for sub-alloc-unit alignment unmaps because the RT device can align file offset to any physical offset, and issue unaligned sector sized IO without any restrictions. Forced alignment does not have this freedom, and when we extend forced alignment to RT files, it will not have the freedom to use unwritten extents for sub-alloc-unit unmapping, either. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx