Re: [PATCH 1/5] xfs: don't treat append-only files as having preallocations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 13, 2024 at 10:28:55AM +0200, Christoph Hellwig wrote:
> On Thu, Jun 13, 2024 at 04:03:53PM +1000, Dave Chinner wrote:
> > I disagree, there was a very good reason for this behaviour:
> > preventing append-only log files from getting excessively fragmented
> > because speculative prealloc would get removed on close().
> 
> Where is that very clear intent documented?  Not in the original
> commit message (which is very sparse) and no where in any documentation
> I can find.

We've lost all the internal SGI bug databases, so there's little to
know evidence I can point at. But at the time, it was a well known
problem amongst Irix XFS engineers that append-only log files would
regularly get horribly fragmented.

There'd been several escalations over that behaviour over the years
w.r.t. large remote servers (think of facilities that "don't trust
the logs on client machines because they might be compromised"). In
general, the fixes for these applications tended to require the
loggin server application to use F_RESVSP to do the append-only log
file initialisation.  That got XFS_DIFLAG_PREALLOC set on the files,
so then anything allocated by appending writes beyond EOF was left
alone. That small change was largely sufficient to mitigate worst
case log file fragmentation on Irix-XFS.

So when adding a flag on disk for Linux-XFS to say "this is an
append only file" it made lots of sense to make it behave like
XFS_DIFLAG_PREALLOC had already been set on the inode without
requring the application to do anything to set that up.

I'll note that the patches sent to the list by Ethan Benson to
originally implement XFS_DIFLAG_APPEND (and others) is not exactly
what was committed in this commit:

https://marc.info/?l=linux-xfs&m=106360278223548&w=2

The last version posted on the list was this:

https://marc.info/?l=linux-xfs&m=106109662212214&w=2

but the version committed had lots of things renamed, sysctls for
sync and nodump inheritance and other bits and pieces including
the EOF freeing changes to skip if DIFLAG_APPEND was set.

It is clear that there was internal SGI discussion, modification and
review of the original proposed patch set, and none of that internal
discussion is on open mailing lists. We might have the historical
XFS code and Linux mailing list archives, but that doesn't always
tell us what institutional knowledge was behind subtle changes to
publicly proposed patches like this....

> > i.e. applications that slowly log messages to append only files
> > with the pattern open(O_APPEND); write(a single line to the log);
> > close(); caused worst case file fragmentation because the close()
> > always removed the speculative prealloc beyond EOF.
> 
> That case should be covered by the XFS_IDIRTY_RELEASE, at least
> except for O_SYNC workloads. 

Ah, so I fixed the problem independently 7 or 8 years later to fix
Linux NFS server performance issues. Ok, that makes removing the
flag less bad, but I still don't see the harm in keeping it there
given that behaviour has existed for the past 20 years....

> > The fix for this pessimisitic XFS behaviour is for the application
> > to use chattr +A (like they would for ext3/4) hence triggering the
> > existence of XFS_DIFLAG_APPEND and that avoided the removal
> > speculative delalloc removed when the file is closed. hence the
> > fragmentation problems went away.
> 
> For ext4 the EXT4_APPEND_FL flag does not cause any difference
> in allocation behavior.

Sure, but ext4 doesn't have speculative preallocation beyond EOF to
prevent fragmentation, either.

> For the historic ext2 driver it apparently
> did just, with an XXX comment marking this as a bug, but for ext3 it
> also never did looking back quite a bit in history.

Ditto - when the filesystem isn't allocating anything beyond EOF,
there's little point in trying to removing blocks beyond EOF that
can't exist on final close()...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux