Re: [RFC 0/5] ext4: Implement support for extsize hints

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 20 Sep 2024 08:34:14 +1000

On Thu, Sep 19, 2024 at 12:43:17PM +0530, Ojaswin Mujoo wrote:
> On Wed, Sep 18, 2024 at 07:54:27PM +1000, Dave Chinner wrote:
> > On Wed, Sep 11, 2024 at 02:31:04PM +0530, Ojaswin Mujoo wrote:
> > Behaviour such as extent size hinting *should* be the same across
> > all filesystems that provide this functionality.  This makes using
> > extent size hints much easier for users, admins and application
> > developers. The last thing I want to hear is application devs tell
> > me at conferences that "we don't use extent size hints anymore
> > because ext4..."
> 
> Yes, makes sense :)  
> 
> Nothing to worry here tho as ext4 also treats the extsize value as a
> hint exactly like XFS. We have tried to keep the behavior as similar
> to XFS as possible for the exact reasons you mentioned. 

It is worth explicitly stating this (i.e. all the behaviours that
are the same) in the design documentation rather than just the
corner cases where it is different. It was certainly not clear how
failures were treated.

> And yes, we do plan to add a forcealign (or similar) feature for ext4 as
> well for atomic writes which would change the hint to a mandate

Ok. That should be stated, too.

FWIW, it would be a good idea to document this all in the kernel
documentation itself, so there is a guideline for other filesystems
to implement the same behaviour. e.g. in
Documentation/filesystems/extent-size-hints.rst

> > > 2. eof allocation on XFS trims the blocks allocated beyond eof with extsize
> > >    hint. That means on XFS for eof allocations (with extsize hint) only logical
> > >    start gets aligned.
> > 
> > I'm not sure I understand what you are saying here. XFS does extsize
> > alignment of both the start and end of post-eof extents the same as
> > it does for extents within EOF. For example:
> > 
> > # xfs_io -fdc "truncate 0" -c "extsize 16k" -c "pwrite 0 4k" -c "bmap -vvp" foo
> > wrote 4096/4096 bytes at offset 0
> > 4 KiB, 1 ops; 0.0308 sec (129.815 KiB/sec and 32.4538 ops/sec)
> > foo:
> > EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
> >    0: [0..7]:          256504..256511    0 (256504..256511)     8 000000
> >    1: [8..31]:         256512..256535    0 (256512..256535)    24 010000
> >  FLAG Values:
> >     0100000 Shared extent
> >     0010000 Unwritten preallocated extent
> > 
> > There's a 4k written extent at 0, and a 12k unwritten extent
> > beyond EOF at 4k. I.e. we have an extent of 16kB as the hint
> > required that is correctly aligned beyond EOF.
> > 
> > If I then write another 4k at 20k (beyond both EOF and the unwritten
> > extent beyond EOF:
> > 
> > # xfs_io -fdc "truncate 0" -c "extsize 16k" -c "pwrite 0 4k" -c "pwrite 20k 4k" -c "bmap -vvp" foo
> > wrote 4096/4096 bytes at offset 0
> > 4 KiB, 1 ops; 0.0210 sec (190.195 KiB/sec and 47.5489 ops/sec)
> > wrote 4096/4096 bytes at offset 20480
> > 4 KiB, 1 ops; 0.0001 sec (21.701 MiB/sec and 5555.5556 ops/sec)
> > foo:
> >  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
> >    0: [0..7]:          180000..180007    0 (180000..180007)     8 000000
> >    1: [8..39]:         180008..180039    0 (180008..180039)    32 010000
> >    2: [40..47]:        180040..180047    0 (180040..180047)     8 000000
> >    3: [48..63]:        180048..180063    0 (180048..180063)    16 010000
> >  FLAG Values:
> >     0100000 Shared extent
> >     0010000 Unwritten preallocated extent
> > 
> > You can see we did contiguous allocation of another 16kB at offset
> > 16kB, and then wrote to 20k for 4kB.. i.e. the new extent was
> > correctly aligned at both sides as the extsize hint says it should
> > be....
> 
> Sorry for the confusion Dave. What was meant is that XFS would indeed
> respect extsize hint for EOF allocations but if we close the file, since
> we trim the blocks past EOF upon close, we would only see that the
> lstart is aligned but the end would not.

Right, but that is desired behaviour, especially when extsize is
large.  i.e. when the file is closed it is an indication that the
file will not be written again, so we don't need to keep post-eof
blocks around for fragmentation prevention reasons.

Removing post-EOF extents on close prevents large extsize hints from
consuming lots of unused space on files that are never going to be
written to again(*).  That's user visible, and because it can cause
premature ENOSPC, users will report this excessive space usage
behaviour as a bug (and they are right).  Hence removing post-eof
extents on file close when extent size hints are in use comes under
the guise of Good Behaviour To Have.

(*) think about how much space is wasted if you clone a kernel git
tree under a 1MB extent size hint directory. All those tiny header
files now take up 1MB of space on disk....

Keep in mind that when the file is opened for write again, the
extent size hint still gets applied to the new extents.  If the
extending write starts beyond the EOF extsize range, then the new
extent after the hole at EOF will be fully extsize aligned, as
expected.

If the new write is exactly extending the file, then the new extents
will not be extsize aligned - the start will be at the EOF block,
and they will be extsize -length-.  IOWs, the extent size is
maintained, just the logical alignment is not exactly extsize
aligned. This could be considered a bug, but it's never been an
issue for anyone because, in XFS, physical extent alignment is
separate (and maintained regardless of logical alignment) for extent
size hint based allocations.

Adding force-align will prevent this behaviour from occurring, as
post-eof trimming will be done to extsize alignment, not to the EOF
block.  Hence open/close/open will not affect logical or physical
alignment of force-align extents (and hence won't affect atomic
writes).

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx