[PATCH -next v5 7/8] xfs: speed up truncating down a big realtime inode

Zhang Yi <yi.zhang@xxxxxxxxxxxxxxx> · Thu, 13 Jun 2024 17:00:32 +0800

From: Zhang Yi <yi.zhang@xxxxxxxxxx>

If we truncate down a big realtime inode, zero out the entire aligned
EOF extent could gets slow down as the rtextsize increases. Fortunately,
__xfs_bunmapi() would align the unmapped range to rtextsize, split and
convert the blocks beyond EOF to unwritten. So speed up this by
adjusting the unitsize to the filesystem blocksize when truncating down
a large realtime inode, let __xfs_bunmapi() convert the tail blocks to
unwritten, this could improve the performance significantly.

 # mkfs.xfs -f -rrtdev=/dev/pmem1s -f -m reflink=0,rmapbt=0, \
            -d rtinherit=1 -r extsize=$rtextsize /dev/pmem2s
 # mount -ortdev=/dev/pmem1s /dev/pmem2s /mnt/scratch
 # for i in {1..1000}; \
   do dd if=/dev/zero of=/mnt/scratch/$i bs=$rtextsize count=1024; done
 # sync
 # time for i in {1..1000}; \
   do xfs_io -c "truncate 4k" /mnt/scratch/$i; done

 rtextsize       8k      16k      32k      64k     256k     1024k
 before:       9.601s  10.229s  11.153s  12.086s  12.259s  20.141s
 after:        9.710s   9.642s   9.958s   9.441s  10.021s  10.526s

Signed-off-by: Zhang Yi <yi.zhang@xxxxxxxxxx>
---
 fs/xfs/xfs_inode.c | 10 ++++++++--
 fs/xfs/xfs_iops.c  |  9 +++++++++
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 92daa2279053..5e837ed093b0 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1487,6 +1487,7 @@ xfs_itruncate_extents_flags(
 	struct xfs_trans	*tp = *tpp;
 	xfs_fileoff_t		first_unmap_block;
 	int			error = 0;
+	unsigned int		unitsize = xfs_inode_alloc_unitsize(ip);
 
 	xfs_assert_ilocked(ip, XFS_ILOCK_EXCL);
 	if (atomic_read(&VFS_I(ip)->i_count))
@@ -1510,9 +1511,14 @@ xfs_itruncate_extents_flags(
 	 *
 	 * We have to free all the blocks to the bmbt maximum offset, even if
 	 * the page cache can't scale that far.
+	 *
+	 * For big realtime inode, don't aligned to allocation unitsize,
+	 * it'll split the extent and convert the tail blocks to unwritten.
 	 */
-	first_unmap_block = XFS_B_TO_FSB(mp,
-			roundup_64(new_size, xfs_inode_alloc_unitsize(ip)));
+	if (xfs_inode_has_bigrtalloc(ip))
+		unitsize = i_blocksize(VFS_I(ip));
+	first_unmap_block = XFS_B_TO_FSB(mp, roundup_64(new_size, unitsize));
+
 	if (!xfs_verify_fileoff(mp, first_unmap_block)) {
 		WARN_ON_ONCE(first_unmap_block > XFS_MAX_FILEOFF);
 		return 0;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 8af13fd37f1b..1903c06d39bc 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -862,6 +862,15 @@ xfs_setattr_truncate_data(
 	/* Truncate down */
 	blocksize = xfs_inode_alloc_unitsize(ip);
 
+	/*
+	 * If it's a big realtime inode, zero out the entire EOF extent could
+	 * get slow down as the rtextsize increases, speed it up by adjusting
+	 * the blocksize to the filesystem blocksize, let __xfs_bunmapi() to
+	 * split the extent and convert the tail blocks to unwritten.
+	 */
+	if (xfs_inode_has_bigrtalloc(ip))
+		blocksize = i_blocksize(inode);
+
 	/*
 	 * iomap won't detect a dirty page over an unwritten block (or a cow
 	 * block over a hole) and subsequently skips zeroing the newly post-EOF
-- 
2.39.2