Hi Darrick,
As mentioned internally, we have an issue for atomic writes [0] that we
get an aligned and but not fully-written extent when we initially write
a size less than the forcealign size, like:
#/mkfs.xfs -f -d forcealign=16k /dev/sda
...
# mount /dev/sda mnt
# touch mnt/file
# /test-pwritev2 -a -d -l 4096 -p 0 /root/mnt/file # direct IO, atomic
write, 4096B at pos 0
# filefrag -v mnt/file
Filesystem type is: 58465342
File size of mnt/file is 4096 (1 block of 4096 bytes)
ext: logical_offset: physical_offset: length: expected:
flags:
0: 0.. 0: 24.. 24: 1:
last,eof
mnt/file: 1 extent found
# /test-pwritev2 -a -d -l 16384 -p 0 /root/mnt/file
wrote -1 bytes at pos 0 write_size=16384
#
This causes an issue for atomic writes in that the 16K write means 2x
mappings and then 2x BIOs, which we cannot tolerate.
So how about this change on top:
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 731260a5af6d..6609f1058ae3 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -685,6 +685,12 @@ xfs_can_free_eofblocks(
end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip));
if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1)
end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb);
+
+ /* Don't trim eof blocks */
+ if (xfs_inode_force_align(ip)) {
+ end_fsb = roundup_64(end_fsb, xfs_get_extsz_hint(ip));
+ }
+
last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes);
if (last_fsb <= end_fsb)
return false;
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 0c7008322326..c906e3a424d1 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -291,6 +291,10 @@ xfs_iomap_write_direct(
}david@xxxxxxxxxxxxx
}
+ if (xfs_inode_force_align(ip)) {
+ bmapi_flags = XFS_BMAPI_ZERO;
+ }
+
error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, dblocks,
rblocks, force, &tp);
if (error)
lines 1-38/38 (END)
Which gives:
#/mkfs.xfs -d forcealign=16k /dev/sda
...
# /test-pwritev2 -a -d -l 4096 -p 0 /root/mnt/file
wrote 4096 bytes at pos 0 write_size=4096
# filefrag -v mnt/file
Filesystem type is: 58465342
File size of mnt/file is 4096 (1 block of 4096 bytes)
ext: logical_offset: physical_offset: length: expected:
flags:
0: 0.. 3: 24.. 27: 4:
last,eof
mnt/file: 1 extent found
#
# /test-pwritev2 -a -d -l 16384 -p 0 /root/mnt/file
wrote 16384 bytes at pos 0 write_size=16384
# filefrag -v mnt/file
Filesystem type is: 58465342
File size of mnt/file is 16384 (4 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected:
flags:
0: 0.. 3: 24.. 27: 4:
last,eof
mnt/file: 1 extent found
#
Or maybe make that change under FS_XFLAG_ATOMICWRITES flag. Previously
we were pre-zero'ing the complete file to get around this.
Thanks,
John
[0]
https://lore.kernel.org/linux-scsi/20240111161522.GB16626@xxxxxx/T/#mbc6824fbe9ce62c9506aa4c3f281173747695d77
(just referencing for others)