[BUG] When developing sector size < page size handling for btrfs, I'm hitting a data corruption, which is only possible with the following out-of-tree patches: btrfs: allow inline data extents creation if sector size < page size btrfs: allow buffered write to skip full page if it's sector aligned [CAUSE] Thankfully no upstream kernels are affected, even if some one is mounting a btrfs created by x86_64 with inlined data extents, they won't hit the corruption. The root cause is that when reading inline extents, we zero out the whole remaining range until folio end. This means such zeroing out can cover ranges that is dirtied but not yet written back, thus lead to data corruption. This needs all the following conditions to be met: - Sector size < page size So no x86_64 is affected. The most common users should be Asahi Linux. But they are safe due to the next two conditions. - Inline data extents are present For sector size < page size cases, we do not allow creating new inline data extents but only reading it. But even all above cases are met by using a x86_64 created btrfs with inlined data extents, the next point will still save us. - Partial uptodate folios are allowed This requires the out-of-tree patch "btrfs: allow buffered write to skip full page if it's sector aligned", or buffered write will read out the whole folio before dirting any range. So end users are completely safe. [TEST CASE] The test case itself is pretty straightforward: - Buffered write [0, 4k) - Drop all page cache - Buffered write [8k, 12k) - Verify the file content Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> --- For anyone who wants to verify the failure, please fetch the following branch, and reset to commit 4df35fbb829dfbcf64a914e5c8f652d9a3ad5227 ("btrfs: allow inline data extents creation if sector size < page size"). https://github.com/adam900710/linux.git subpage The top commit e7338d321bdf48e3b503c40e8eca7d7592709c83 ("btrfs: fix inline data extent reads which zero out the remaining part") is the fix. --- tests/btrfs/327 | 58 +++++++++++++++++++++++++++++++++++++++++++++ tests/btrfs/327.out | 2 ++ 2 files changed, 60 insertions(+) create mode 100755 tests/btrfs/327 create mode 100644 tests/btrfs/327.out diff --git a/tests/btrfs/327 b/tests/btrfs/327 new file mode 100755 index 00000000..72269fc7 --- /dev/null +++ b/tests/btrfs/327 @@ -0,0 +1,58 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved. +# +# FS QA Test 327 +# +# Make sure reading inlined extents doesn't cause any corruption. +# +# This is a preventive test case inspired by btrfs/149, which can cause +# data corruption when the following out-of-tree patches are applied and +# the sector size is smaller than page size: +# +# btrfs: allow inline data extents creation if sector size < page size +# btrfs: allow buffered write to skip full page if it's sector aligned +# +# Thankfully no upstream kernel is affected. + +. ./common/preamble +_begin_fstest auto quick compress + +_require_scratch + +# We need 4K sector size support, especially this case can only be triggered +# with sector size < page size for now. +# +# We do not check the page size and not_run so far, as in the long term btrfs +# will support larger folios, then in that future 4K page size should be enough +# to trigger the bug. +_require_btrfs_support_sectorsize 4096 + +_scratch_mkfs >>$seqres.full 2>&1 +_scratch_mount "-o compress,max_inline=4095" + +# Create one inlined data extent, only when using compression we can +# create an inlined data extent up to sectorsize. +# And for sector size < page size cases, we need the out-of-tree patch +# "btrfs: allow inline data extents creation if sector size < page size" to +# create such extent. +xfs_io -f -c "pwrite 0 4k" "$SCRATCH_MNT/foobar" > /dev/null + +# Drop the cache, we need to read out the above inlined data extent. +echo 3 > /proc/sys/vm/drop_caches + +# Write into range [8k, 12k), with the out-of-tree patch "btrfs: allow +# buffered write to skip full page if it's sector aligned", such write will not +# trigger the read on the folio. +xfs_io -f -c "pwrite 8k 4k" "$SCRATCH_MNT/foobar" > /dev/null + +# Verify the checksum, for the affected devel branch, the read of inline extent +# will zero out all the remaining range of the folio, screwing up the content +# at [8K, 12k). +_md5_checksum "$SCRATCH_MNT/foobar" + +_scratch_unmount + +# success, all done +status=0 +exit diff --git a/tests/btrfs/327.out b/tests/btrfs/327.out new file mode 100644 index 00000000..aebf8c72 --- /dev/null +++ b/tests/btrfs/327.out @@ -0,0 +1,2 @@ +QA output created by 327 +277f3840b275c74d01e979ea9d75ac19 -- 2.46.0