On 29/11/24 08:32, Zorro Lang wrote:
On Fri, Nov 15, 2024 at 07:49:26PM +1030, Qu Wenruo wrote:
[BUG]
When developing sector size < page size handling for btrfs, I'm hitting
a data corruption, which is only possible with the following out-of-tree
patches:
btrfs: allow inline data extents creation if sector size < page size
btrfs: allow buffered write to skip full page if it's sector aligned
[CAUSE]
Thankfully no upstream kernels are affected, even if some one is
mounting a btrfs created by x86_64 with inlined data extents, they won't
hit the corruption.
The root cause is that when reading inline extents, we zero out the
whole remaining range until folio end.
This means such zeroing out can cover ranges that is dirtied but not yet
written back, thus lead to data corruption.
This needs all the following conditions to be met:
- Sector size < page size
So no x86_64 is affected. The most common users should be Asahi Linux.
But they are safe due to the next two conditions.
- Inline data extents are present
For sector size < page size cases, we do not allow creating new inline
data extents but only reading it.
But even all above cases are met by using a x86_64 created btrfs with
inlined data extents, the next point will still save us.
- Partial uptodate folios are allowed
This requires the out-of-tree patch "btrfs: allow buffered write to skip
full page if it's sector aligned", or buffered write will read out the
whole folio before dirting any range.
So end users are completely safe.
[TEST CASE]
The test case itself is pretty straightforward:
- Buffered write [0, 4k)
- Drop all page cache
- Buffered write [8k, 12k)
- Verify the file content
Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
---
For anyone who wants to verify the failure, please fetch the following
branch, and reset to commit 4df35fbb829dfbcf64a914e5c8f652d9a3ad5227
("btrfs: allow inline data extents creation if sector size < page
size").
https://github.com/adam900710/linux.git subpage
The top commit e7338d321bdf48e3b503c40e8eca7d7592709c83
("btrfs: fix inline data extent reads which zero out the remaining part") is the fix.
---
tests/btrfs/327 | 58 +++++++++++++++++++++++++++++++++++++++++++++
tests/btrfs/327.out | 2 ++
2 files changed, 60 insertions(+)
create mode 100755 tests/btrfs/327
create mode 100644 tests/btrfs/327.out
diff --git a/tests/btrfs/327 b/tests/btrfs/327
new file mode 100755
index 00000000..72269fc7
--- /dev/null
+++ b/tests/btrfs/327
@@ -0,0 +1,58 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2024 SUSE Linux Products GmbH. All Rights Reserved.
+#
+# FS QA Test 327
+#
+# Make sure reading inlined extents doesn't cause any corruption.
+#
+# This is a preventive test case inspired by btrfs/149, which can cause
+# data corruption when the following out-of-tree patches are applied and
+# the sector size is smaller than page size:
+#
+# btrfs: allow inline data extents creation if sector size < page size
+# btrfs: allow buffered write to skip full page if it's sector aligned
+#
+# Thankfully no upstream kernel is affected.
+
+. ./common/preamble
+_begin_fstest auto quick compress
+
+_require_scratch
+
+# We need 4K sector size support, especially this case can only be triggered
+# with sector size < page size for now.
+#
+# We do not check the page size and not_run so far, as in the long term btrfs
+# will support larger folios, then in that future 4K page size should be enough
+# to trigger the bug.
+_require_btrfs_support_sectorsize 4096
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount "-o compress,max_inline=4095"
Also, can you please add a comment explaining why max_inline is set to
pagesize - 1?
Thanks, Anand
+
+# Create one inlined data extent, only when using compression we can
+# create an inlined data extent up to sectorsize.
+# And for sector size < page size cases, we need the out-of-tree patch
+# "btrfs: allow inline data extents creation if sector size < page size" to
+# create such extent.
+xfs_io -f -c "pwrite 0 4k" "$SCRATCH_MNT/foobar" > /dev/null
$XFS_IO_PROG
+
+# Drop the cache, we need to read out the above inlined data extent.
+echo 3 > /proc/sys/vm/drop_caches
+
+# Write into range [8k, 12k), with the out-of-tree patch "btrfs: allow
+# buffered write to skip full page if it's sector aligned", such write will not
+# trigger the read on the folio.
+xfs_io -f -c "pwrite 8k 4k" "$SCRATCH_MNT/foobar" > /dev/null
$XFS_IO_PROG
+
+# Verify the checksum, for the affected devel branch, the read of inline extent
+# will zero out all the remaining range of the folio, screwing up the content
+# at [8K, 12k).
+_md5_checksum "$SCRATCH_MNT/foobar"
+
+_scratch_unmount
This's not needed if it's not a necessary test step.
Others look good to me, if no more review points from btrfs list, I'll merge
this patch with above changes.
Reviewed-by: Zorro Lang <zlang@xxxxxxxxxx>
+
+# success, all done
+status=0
+exit
diff --git a/tests/btrfs/327.out b/tests/btrfs/327.out
new file mode 100644
index 00000000..aebf8c72
--- /dev/null
+++ b/tests/btrfs/327.out
@@ -0,0 +1,2 @@
+QA output created by 327
+277f3840b275c74d01e979ea9d75ac19
--
2.46.0