On Thu, Sep 19, 2024 at 12:43:17PM +0530, Ojaswin Mujoo wrote: > On Wed, Sep 18, 2024 at 07:54:27PM +1000, Dave Chinner wrote: > > On Wed, Sep 11, 2024 at 02:31:04PM +0530, Ojaswin Mujoo wrote: > > Behaviour such as extent size hinting *should* be the same across > > all filesystems that provide this functionality. This makes using > > extent size hints much easier for users, admins and application > > developers. The last thing I want to hear is application devs tell > > me at conferences that "we don't use extent size hints anymore > > because ext4..." > > Yes, makes sense :) > > Nothing to worry here tho as ext4 also treats the extsize value as a > hint exactly like XFS. We have tried to keep the behavior as similar > to XFS as possible for the exact reasons you mentioned. It is worth explicitly stating this (i.e. all the behaviours that are the same) in the design documentation rather than just the corner cases where it is different. It was certainly not clear how failures were treated. > And yes, we do plan to add a forcealign (or similar) feature for ext4 as > well for atomic writes which would change the hint to a mandate Ok. That should be stated, too. FWIW, it would be a good idea to document this all in the kernel documentation itself, so there is a guideline for other filesystems to implement the same behaviour. e.g. in Documentation/filesystems/extent-size-hints.rst > > > 2. eof allocation on XFS trims the blocks allocated beyond eof with extsize > > > hint. That means on XFS for eof allocations (with extsize hint) only logical > > > start gets aligned. > > > > I'm not sure I understand what you are saying here. XFS does extsize > > alignment of both the start and end of post-eof extents the same as > > it does for extents within EOF. For example: > > > > # xfs_io -fdc "truncate 0" -c "extsize 16k" -c "pwrite 0 4k" -c "bmap -vvp" foo > > wrote 4096/4096 bytes at offset 0 > > 4 KiB, 1 ops; 0.0308 sec (129.815 KiB/sec and 32.4538 ops/sec) > > foo: > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > > 0: [0..7]: 256504..256511 0 (256504..256511) 8 000000 > > 1: [8..31]: 256512..256535 0 (256512..256535) 24 010000 > > FLAG Values: > > 0100000 Shared extent > > 0010000 Unwritten preallocated extent > > > > There's a 4k written extent at 0, and a 12k unwritten extent > > beyond EOF at 4k. I.e. we have an extent of 16kB as the hint > > required that is correctly aligned beyond EOF. > > > > If I then write another 4k at 20k (beyond both EOF and the unwritten > > extent beyond EOF: > > > > # xfs_io -fdc "truncate 0" -c "extsize 16k" -c "pwrite 0 4k" -c "pwrite 20k 4k" -c "bmap -vvp" foo > > wrote 4096/4096 bytes at offset 0 > > 4 KiB, 1 ops; 0.0210 sec (190.195 KiB/sec and 47.5489 ops/sec) > > wrote 4096/4096 bytes at offset 20480 > > 4 KiB, 1 ops; 0.0001 sec (21.701 MiB/sec and 5555.5556 ops/sec) > > foo: > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > > 0: [0..7]: 180000..180007 0 (180000..180007) 8 000000 > > 1: [8..39]: 180008..180039 0 (180008..180039) 32 010000 > > 2: [40..47]: 180040..180047 0 (180040..180047) 8 000000 > > 3: [48..63]: 180048..180063 0 (180048..180063) 16 010000 > > FLAG Values: > > 0100000 Shared extent > > 0010000 Unwritten preallocated extent > > > > You can see we did contiguous allocation of another 16kB at offset > > 16kB, and then wrote to 20k for 4kB.. i.e. the new extent was > > correctly aligned at both sides as the extsize hint says it should > > be.... > > Sorry for the confusion Dave. What was meant is that XFS would indeed > respect extsize hint for EOF allocations but if we close the file, since > we trim the blocks past EOF upon close, we would only see that the > lstart is aligned but the end would not. Right, but that is desired behaviour, especially when extsize is large. i.e. when the file is closed it is an indication that the file will not be written again, so we don't need to keep post-eof blocks around for fragmentation prevention reasons. Removing post-EOF extents on close prevents large extsize hints from consuming lots of unused space on files that are never going to be written to again(*). That's user visible, and because it can cause premature ENOSPC, users will report this excessive space usage behaviour as a bug (and they are right). Hence removing post-eof extents on file close when extent size hints are in use comes under the guise of Good Behaviour To Have. (*) think about how much space is wasted if you clone a kernel git tree under a 1MB extent size hint directory. All those tiny header files now take up 1MB of space on disk.... Keep in mind that when the file is opened for write again, the extent size hint still gets applied to the new extents. If the extending write starts beyond the EOF extsize range, then the new extent after the hole at EOF will be fully extsize aligned, as expected. If the new write is exactly extending the file, then the new extents will not be extsize aligned - the start will be at the EOF block, and they will be extsize -length-. IOWs, the extent size is maintained, just the logical alignment is not exactly extsize aligned. This could be considered a bug, but it's never been an issue for anyone because, in XFS, physical extent alignment is separate (and maintained regardless of logical alignment) for extent size hint based allocations. Adding force-align will prevent this behaviour from occurring, as post-eof trimming will be done to extsize alignment, not to the EOF block. Hence open/close/open will not affect logical or physical alignment of force-align extents (and hence won't affect atomic writes). -Dave. -- Dave Chinner david@xxxxxxxxxxxxx