On Fri, Sep 20, 2024 at 08:34:14AM +1000, Dave Chinner wrote: > On Thu, Sep 19, 2024 at 12:43:17PM +0530, Ojaswin Mujoo wrote: > > On Wed, Sep 18, 2024 at 07:54:27PM +1000, Dave Chinner wrote: > > > On Wed, Sep 11, 2024 at 02:31:04PM +0530, Ojaswin Mujoo wrote: > > > Behaviour such as extent size hinting *should* be the same across > > > all filesystems that provide this functionality. This makes using > > > extent size hints much easier for users, admins and application > > > developers. The last thing I want to hear is application devs tell > > > me at conferences that "we don't use extent size hints anymore > > > because ext4..." > > > > Yes, makes sense :) > > > > Nothing to worry here tho as ext4 also treats the extsize value as a > > hint exactly like XFS. We have tried to keep the behavior as similar > > to XFS as possible for the exact reasons you mentioned. > > It is worth explicitly stating this (i.e. all the behaviours that > are the same) in the design documentation rather than just the > corner cases where it is different. It was certainly not clear how > failures were treated. Got it Dave, I did mention it in the actual commit 5/5 but I agree. I will update the cover letter to be more clear about the design in future revisions. > > > And yes, we do plan to add a forcealign (or similar) feature for ext4 as > > well for atomic writes which would change the hint to a mandate > > Ok. That should be stated, too. > > FWIW, it would be a good idea to document this all in the kernel > documentation itself, so there is a guideline for other filesystems > to implement the same behaviour. e.g. in > Documentation/filesystems/extent-size-hints.rst Okay makes sense, I can look into this as a next step. > > > > > 2. eof allocation on XFS trims the blocks allocated beyond eof with extsize > > > > hint. That means on XFS for eof allocations (with extsize hint) only logical > > > > start gets aligned. > > > > > > I'm not sure I understand what you are saying here. XFS does extsize > > > alignment of both the start and end of post-eof extents the same as > > > it does for extents within EOF. For example: > > > > > > # xfs_io -fdc "truncate 0" -c "extsize 16k" -c "pwrite 0 4k" -c "bmap -vvp" foo > > > wrote 4096/4096 bytes at offset 0 > > > 4 KiB, 1 ops; 0.0308 sec (129.815 KiB/sec and 32.4538 ops/sec) > > > foo: > > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > > > 0: [0..7]: 256504..256511 0 (256504..256511) 8 000000 > > > 1: [8..31]: 256512..256535 0 (256512..256535) 24 010000 > > > FLAG Values: > > > 0100000 Shared extent > > > 0010000 Unwritten preallocated extent > > > > > > There's a 4k written extent at 0, and a 12k unwritten extent > > > beyond EOF at 4k. I.e. we have an extent of 16kB as the hint > > > required that is correctly aligned beyond EOF. > > > > > > If I then write another 4k at 20k (beyond both EOF and the unwritten > > > extent beyond EOF: > > > > > > # xfs_io -fdc "truncate 0" -c "extsize 16k" -c "pwrite 0 4k" -c "pwrite 20k 4k" -c "bmap -vvp" foo > > > wrote 4096/4096 bytes at offset 0 > > > 4 KiB, 1 ops; 0.0210 sec (190.195 KiB/sec and 47.5489 ops/sec) > > > wrote 4096/4096 bytes at offset 20480 > > > 4 KiB, 1 ops; 0.0001 sec (21.701 MiB/sec and 5555.5556 ops/sec) > > > foo: > > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > > > 0: [0..7]: 180000..180007 0 (180000..180007) 8 000000 > > > 1: [8..39]: 180008..180039 0 (180008..180039) 32 010000 > > > 2: [40..47]: 180040..180047 0 (180040..180047) 8 000000 > > > 3: [48..63]: 180048..180063 0 (180048..180063) 16 010000 > > > FLAG Values: > > > 0100000 Shared extent > > > 0010000 Unwritten preallocated extent > > > > > > You can see we did contiguous allocation of another 16kB at offset > > > 16kB, and then wrote to 20k for 4kB.. i.e. the new extent was > > > correctly aligned at both sides as the extsize hint says it should > > > be.... > > > > Sorry for the confusion Dave. What was meant is that XFS would indeed > > respect extsize hint for EOF allocations but if we close the file, since > > we trim the blocks past EOF upon close, we would only see that the > > lstart is aligned but the end would not. > > Right, but that is desired behaviour, especially when extsize is > large. i.e. when the file is closed it is an indication that the > file will not be written again, so we don't need to keep post-eof > blocks around for fragmentation prevention reasons. > > Removing post-EOF extents on close prevents large extsize hints from > consuming lots of unused space on files that are never going to be > written to again(*). That's user visible, and because it can cause > premature ENOSPC, users will report this excessive space usage > behaviour as a bug (and they are right). Hence removing post-eof > extents on file close when extent size hints are in use comes under > the guise of Good Behaviour To Have. > > (*) think about how much space is wasted if you clone a kernel git > tree under a 1MB extent size hint directory. All those tiny header > files now take up 1MB of space on disk.... > > Keep in mind that when the file is opened for write again, the > extent size hint still gets applied to the new extents. If the > extending write starts beyond the EOF extsize range, then the new > extent after the hole at EOF will be fully extsize aligned, as > expected. > > If the new write is exactly extending the file, then the new extents > will not be extsize aligned - the start will be at the EOF block, > and they will be extsize -length-. IOWs, the extent size is > maintained, just the logical alignment is not exactly extsize > aligned. This could be considered a bug, but it's never been an > issue for anyone because, in XFS, physical extent alignment is > separate (and maintained regardless of logical alignment) for extent > size hint based allocations. > > Adding force-align will prevent this behaviour from occurring, as > post-eof trimming will be done to extsize alignment, not to the EOF > block. Hence open/close/open will not affect logical or physical > alignment of force-align extents (and hence won't affect atomic > writes). Thanks for the context, I will try to keep this behavior similar to XFS once we implement the EOF support for extsize hints in next revision. Regards, Ojaswin > > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx