Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]





On 2023/6/20 00:13, Filipe Manana wrote:
On Thu, Jun 8, 2023 at 1:02 PM Qu Wenruo <wqu@xxxxxxxx> wrote:

[BUG]
There is a chance that btrfs/266 would fail on aarch64 with 64K page
size. (No matter if it's 4K sector size or 64K sector size)

The failure indicates that one or more mirrors are not properly fixed.

[CAUSE]
I have added some trace events into the btrfs IO paths, including
__btrfs_submit_bio() and __end_bio_extent_readpage().

When the test failed, the trace looks like this:

  112819.764977: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=196608 pid=33663
                                     ^^^ Initial read on the full 192K file
  112819.766604: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=2 len=65536 pid=21806
                                     ^^^ Repair on the first 64K block
                                         Which would success
  112819.766760: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=2 len=65536 pid=21806
                                     ^^^ Repair on the second 64K block
                                         Which would fail
  112819.767625: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=3 len=65536 pid=21806
                                     ^^^ Repair on the third 64K block
                                         Which would success
  112819.770999: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=1 status=0
                                          ^^^ The repair succeeded, the
                                              full 192K read finished.

  112819.864851: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=3 len=196608 pid=33665
  112819.874854: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=65536 pid=31369
  112819.875029: __btrfs_submit_bio: r/i=5/257 fileoff=131072 mirror=1 len=65536 pid=31369
  112819.876926: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=3 status=0

But above read only happen for mirror 1 and mirror 3, mirror 2 is not
involved.
This means by somehow, the read on mirror 2 didn't happen, mostly
due to something wrong during the drop_caches call.

It may be an async operation or some hardware specific behavior.

On the other hand, for test cases doing the same operation but utilizing
direct IO, aka btrfs/267, it never fails as we never populate the page
cache thus would always read from the disk.

[WORKAROUND]
The root cause is in the "echo 3 > drop_caches", which I'm still
debugging.

But at the same time, we can workaround the problem by forcing a
cycle mount of scratch device, inside _btrfs_buffered_read_on_mirror().

By this we can ensure all page caches are dropped no matter if
drop_caches is working correctly.

With this patch, I no longer hit the failure on aarch64 with 64K page
size anymore, while before this the test case would always fail during a
-I 10 run.

Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
---
RFC:
The root cause is still inside that "echo 3 > drop_caches", but I don't
have any better solution if such fundamental debug function is not
working as expected.

Thus this is only a workaround, before I can pin down the root cause of
that drop_cache hiccup.
---
  common/btrfs | 5 +++++
  1 file changed, 5 insertions(+)

diff --git a/common/btrfs b/common/btrfs
index 344509ce..1d522c33 100644
--- a/common/btrfs
+++ b/common/btrfs
@@ -599,6 +599,11 @@ _btrfs_buffered_read_on_mirror()
         local size=$5

         echo 3 > /proc/sys/vm/drop_caches
+       # Above drop_caches doesn't seem to drop every pages on aarch64 with
+       # 64K page size.
+       # So here as a workaround, cycle mount the SCRATCH_MNT to ensure
+       # the cache are dropped.
+       _scratch_cycle_mount

Btw, I'm getting some tests failing because of this change.

For e.g.:

./check btrfs/143
FSTYP         -- btrfs
PLATFORM      -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP
PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
MKFS_OPTIONS  -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1

btrfs/143 6s ... [failed, exit status 1]- output mismatch (see
/home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad)
     --- tests/btrfs/143.out 2020-06-10 19:29:03.818519162 +0100
     +++ /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad
2023-06-19 17:04:00.575033899 +0100
     @@ -1,37 +1,6 @@
      QA output created by 143
      wrote 131072/131072 bytes
      XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
     -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
................
     -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
................
     -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
................
     -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
................
     ...
     (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/143.out
/home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad'  to see
the entire diff)
Ran: btrfs/143
Failures: btrfs/143
Failed 1 of 1 tests

The problem is this test is using dm-rust, and _scratch_cycle_mount()
doesn't work in that case.
It should be _unmount_dust() followed by_mount_dust() for this particular case.

Any idea on a better solution?

My current idea is to grab the mounted device of SCRATCH_MNT, then unmount and mount the grabbed device, instead of always using scratch device.

But it may look a little over-complicated and would affect too many test cases.

Or maybe we can detect if it's using dust device inside _btrfs_buffered_read_on_mirror() instead?

Thanks,
Qu


         while [[ -z $( (( BASHPID % nr_mirrors == mirror )) &&
                 exec $XFS_IO_PROG \
                         -c "pread -b $size $offset $size" $file) ]]; do
--
2.39.1




[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux