On Sun, Jun 23, 2024 at 07:34:51AM +0200, Christoph Hellwig wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > When we have a workload that does open/read/close in parallel with other > allocation, the file becomes rapidly fragmented. This is due to close() > calling xfs_file_release() and removing the speculative preallocation > beyond EOF. > > Add a check for a writable context to xfs_file_release to skip the > post-EOF block freeing (an the similarly pointless flushing on truncate > down). > > Before: > > Test 1: sync write fragmentation counts > > /mnt/scratch/file.0: 919 > /mnt/scratch/file.1: 916 > /mnt/scratch/file.2: 919 > /mnt/scratch/file.3: 920 > /mnt/scratch/file.4: 920 > /mnt/scratch/file.5: 921 > /mnt/scratch/file.6: 916 > /mnt/scratch/file.7: 918 > > After: > > Test 1: sync write fragmentation counts > > /mnt/scratch/file.0: 24 > /mnt/scratch/file.1: 24 > /mnt/scratch/file.2: 11 > /mnt/scratch/file.3: 24 > /mnt/scratch/file.4: 3 > /mnt/scratch/file.5: 24 > /mnt/scratch/file.6: 24 > /mnt/scratch/file.7: 23 > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > [darrick: wordsmithing, fix commit message] > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > [hch: ported to the new ->release code structure] > Signed-off-by: Christoph Hellwig <hch@xxxxxx> I like how this has gotten much pared down from what's been lurking in my tree for ages. Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> --D > --- > fs/xfs/xfs_file.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 0380e0b1d9c6c7..8d70171678fe24 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -1228,12 +1228,18 @@ xfs_file_release( > * There is no point in freeing blocks here for open but unlinked files > * as they will be taken care of by the inactivation path soon. > * > + * When releasing a read-only context, don't flush data or trim post-EOF > + * blocks. This avoids open/read/close workloads from removing EOF > + * blocks that other writers depend upon to reduce fragmentation. > + * > * If we can't get the iolock just skip truncating the blocks past EOF > * because we could deadlock with the mmap_lock otherwise. We'll get > * another chance to drop them once the last reference to the inode is > * dropped, so we'll never leak blocks permanently. > */ > - if (inode->i_nlink && xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) { > + if (inode->i_nlink && > + (file->f_mode & FMODE_WRITE) && > + xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) { > if (xfs_can_free_eofblocks(ip) && > !xfs_iflags_test(ip, XFS_IDIRTY_RELEASE)) { > /* > -- > 2.43.0 > >