This is a note to let you know that I've just added the patch titled xfs: fix sb write verify for lazysbcount to the 6.1-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: xfs-fix-sb-write-verify-for-lazysbcount.patch and it can be found in the queue-6.1 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From stable+bounces-42901-greg=kroah.com@xxxxxxxxxxxxxxx Wed May 1 20:41:52 2024 From: Leah Rumancik <leah.rumancik@xxxxxxxxx> Date: Wed, 1 May 2024 11:41:00 -0700 Subject: xfs: fix sb write verify for lazysbcount To: stable@xxxxxxxxxxxxxxx Cc: linux-xfs@xxxxxxxxxxxxxxx, amir73il@xxxxxxxxx, chandan.babu@xxxxxxxxxx, fred@xxxxxxxxxxxxxx, Long Li <leo.lilong@xxxxxxxxxx>, "Darrick J . Wong" <djwong@xxxxxxxxxx>, Leah Rumancik <leah.rumancik@xxxxxxxxx> Message-ID: <20240501184112.3799035-12-leah.rumancik@xxxxxxxxx> From: Long Li <leo.lilong@xxxxxxxxxx> [ Upstream commit 59f6ab40fd8735c9a1a15401610a31cc06a0bbd6 ] When lazysbcount is enabled, fsstress and loop mount/unmount test report the following problems: XFS (loop0): SB summary counter sanity check failed XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x13b/0x460, xfs_sb block 0x0 XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 28 00 00 XFSB.........(.. 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000020: 69 fb 7c cd 5f dc 44 af 85 74 e0 cc d4 e3 34 5a i.|._.D..t....4Z 00000030: 00 00 00 00 00 20 00 06 00 00 00 00 00 00 00 80 ..... .......... 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 ................ 00000050: 00 00 00 01 00 0a 00 00 00 00 00 04 00 00 00 00 ................ 00000060: 00 00 0a 00 b4 b5 02 00 02 00 00 08 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 14 00 00 19 ................ XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply +0xe1e/0x10e0 (fs/xfs/xfs_buf.c:1580). Shutting down filesystem. XFS (loop0): Please unmount the filesystem and rectify the problem(s) XFS (loop0): log mount/recovery failed: error -117 XFS (loop0): log mount failed This corruption will shutdown the file system and the file system will no longer be mountable. The following script can reproduce the problem, but it may take a long time. #!/bin/bash device=/dev/sda testdir=/mnt/test round=0 function fail() { echo "$*" exit 1 } mkdir -p $testdir while [ $round -lt 10000 ] do echo "******* round $round ********" mkfs.xfs -f $device mount $device $testdir || fail "mount failed!" fsstress -d $testdir -l 0 -n 10000 -p 4 >/dev/null & sleep 4 killall -w fsstress umount $testdir xfs_repair -e $device > /dev/null if [ $? -eq 2 ];then echo "ERR CODE 2: Dirty log exception during repair." exit 1 fi round=$(($round+1)) done With lazysbcount is enabled, There is no additional lock protection for reading m_ifree and m_icount in xfs_log_sb(), if other cpu modifies the m_ifree, this will make the m_ifree greater than m_icount. For example, consider the following sequence and ifreedelta is postive: CPU0 CPU1 xfs_log_sb xfs_trans_unreserve_and_mod_sb ---------- ------------------------------ percpu_counter_sum(&mp->m_icount) percpu_counter_add_batch(&mp->m_icount, idelta, XFS_ICOUNT_BATCH) percpu_counter_add(&mp->m_ifree, ifreedelta); percpu_counter_sum(&mp->m_ifree) After this, incorrect inode count (sb_ifree > sb_icount) will be writen to the log. In the subsequent writing of sb, incorrect inode count (sb_ifree > sb_icount) will fail to pass the boundary check in xfs_validate_sb_write() that cause the file system shutdown. When lazysbcount is enabled, we don't need to guarantee that Lazy sb counters are completely correct, but we do need to guarantee that sb_ifree <= sb_icount. On the other hand, the constraint that m_ifree <= m_icount must be satisfied any time that there /cannot/ be other threads allocating or freeing inode chunks. If the constraint is violated under these circumstances, sb_i{count,free} (the ondisk superblock inode counters) maybe incorrect and need to be marked sick at unmount, the count will be rebuilt on the next mount. Fixes: 8756a5af1819 ("libxfs: add more bounds checking to sb sanity checks") Signed-off-by: Long Li <leo.lilong@xxxxxxxxxx> Reviewed-by: Darrick J. Wong <djwong@xxxxxxxxxx> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> Signed-off-by: Leah Rumancik <leah.rumancik@xxxxxxxxx> Acked-by: Darrick J. Wong <djwong@xxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- fs/xfs/libxfs/xfs_sb.c | 4 +++- fs/xfs/xfs_mount.c | 15 +++++++++++++++ 2 files changed, 18 insertions(+), 1 deletion(-) --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -973,7 +973,9 @@ xfs_log_sb( */ if (xfs_has_lazysbcount(mp)) { mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); - mp->m_sb.sb_ifree = percpu_counter_sum(&mp->m_ifree); + mp->m_sb.sb_ifree = min_t(uint64_t, + percpu_counter_sum(&mp->m_ifree), + mp->m_sb.sb_icount); mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); } --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -538,6 +538,20 @@ xfs_check_summary_counts( return 0; } +static void +xfs_unmount_check( + struct xfs_mount *mp) +{ + if (xfs_is_shutdown(mp)) + return; + + if (percpu_counter_sum(&mp->m_ifree) > + percpu_counter_sum(&mp->m_icount)) { + xfs_alert(mp, "ifree/icount mismatch at unmount"); + xfs_fs_mark_sick(mp, XFS_SICK_FS_COUNTERS); + } +} + /* * Flush and reclaim dirty inodes in preparation for unmount. Inodes and * internal inode structures can be sitting in the CIL and AIL at this point, @@ -1077,6 +1091,7 @@ xfs_unmountfs( if (error) xfs_warn(mp, "Unable to free reserved block pool. " "Freespace may not be correct on next mount."); + xfs_unmount_check(mp); xfs_log_unmount(mp); xfs_da_unmount(mp); Patches currently in stable-queue which might be from kroah.com@xxxxxxxxxxxxxxx are queue-6.1/xfs-iomap-move-delalloc-punching-to-iomap.patch queue-6.1/xfs-fix-off-by-one-block-in-xfs_discard_folio.patch queue-6.1/xfs-invalidate-block-device-page-cache-during-unmount.patch queue-6.1/xfs-drop-write-error-injection-is-unfixable-remove-it.patch queue-6.1/iomap-buffered-write-failure-should-not-truncate-the-page-cache.patch queue-6.1/xfs-fix-super-block-buf-log-item-uaf-during-force-shutdown.patch queue-6.1/xfs-fix-incorrect-i_nlink-caused-by-inode-racing.patch queue-6.1/xfs-estimate-post-merge-refcounts-correctly.patch queue-6.1/xfs-fix-log-recovery-when-unknown-rocompat-bits-are-set.patch queue-6.1/xfs-punching-delalloc-extents-on-write-failure-is-racy.patch queue-6.1/xfs-allow-inode-inactivation-during-a-ro-mount-log-recovery.patch queue-6.1/iomap-write-iomap-validity-checks.patch queue-6.1/xfs-attach-dquots-to-inode-before-reading-data-cow-fork-mappings.patch queue-6.1/xfs-fix-sb-write-verify-for-lazysbcount.patch queue-6.1/xfs-wait-iclog-complete-before-tearing-down-ail.patch queue-6.1/xfs-use-byte-ranges-for-write-cleanup-ranges.patch queue-6.1/xfs-xfs_bmap_punch_delalloc_range-should-take-a-byte-range.patch queue-6.1/xfs-write-page-faults-in-iomap-are-not-buffered-writes.patch queue-6.1/xfs-short-circuit-xfs_growfs_data_private-if-delta-is-zero.patch queue-6.1/xfs-fix-incorrect-error-out-in-xfs_remove.patch queue-6.1/xfs-invalidate-xfs_bufs-when-allocating-cow-extents.patch queue-6.1/xfs-hoist-refcount-record-merge-predicates.patch queue-6.1/xfs-get-root-inode-correctly-at-bulkstat.patch queue-6.1/xfs-use-iomap_valid-method-to-detect-stale-cached-iomaps.patch