Re: [PATCH RFC] common/btrfs: use _scratch_cycle_mount to ensure all page caches are dropped

Filipe Manana <fdmanana@xxxxxxxxxx> · Mon, 19 Jun 2023 17:13:13 +0100

On Thu, Jun 8, 2023 at 1:02 PM Qu Wenruo <wqu@xxxxxxxx> wrote:
>
> [BUG]
> There is a chance that btrfs/266 would fail on aarch64 with 64K page
> size. (No matter if it's 4K sector size or 64K sector size)
>
> The failure indicates that one or more mirrors are not properly fixed.
>
> [CAUSE]
> I have added some trace events into the btrfs IO paths, including
> __btrfs_submit_bio() and __end_bio_extent_readpage().
>
> When the test failed, the trace looks like this:
>
>  112819.764977: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=196608 pid=33663
>                                     ^^^ Initial read on the full 192K file
>  112819.766604: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=2 len=65536 pid=21806
>                                     ^^^ Repair on the first 64K block
>                                         Which would success
>  112819.766760: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=2 len=65536 pid=21806
>                                     ^^^ Repair on the second 64K block
>                                         Which would fail
>  112819.767625: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=3 len=65536 pid=21806
>                                     ^^^ Repair on the third 64K block
>                                         Which would success
>  112819.770999: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=1 status=0
>                                          ^^^ The repair succeeded, the
>                                              full 192K read finished.
>
>  112819.864851: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=3 len=196608 pid=33665
>  112819.874854: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=65536 pid=31369
>  112819.875029: __btrfs_submit_bio: r/i=5/257 fileoff=131072 mirror=1 len=65536 pid=31369
>  112819.876926: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=3 status=0
>
> But above read only happen for mirror 1 and mirror 3, mirror 2 is not
> involved.
> This means by somehow, the read on mirror 2 didn't happen, mostly
> due to something wrong during the drop_caches call.
>
> It may be an async operation or some hardware specific behavior.
>
> On the other hand, for test cases doing the same operation but utilizing
> direct IO, aka btrfs/267, it never fails as we never populate the page
> cache thus would always read from the disk.
>
> [WORKAROUND]
> The root cause is in the "echo 3 > drop_caches", which I'm still
> debugging.
>
> But at the same time, we can workaround the problem by forcing a
> cycle mount of scratch device, inside _btrfs_buffered_read_on_mirror().
>
> By this we can ensure all page caches are dropped no matter if
> drop_caches is working correctly.
>
> With this patch, I no longer hit the failure on aarch64 with 64K page
> size anymore, while before this the test case would always fail during a
> -I 10 run.
>
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> ---
> RFC:
> The root cause is still inside that "echo 3 > drop_caches", but I don't
> have any better solution if such fundamental debug function is not
> working as expected.
>
> Thus this is only a workaround, before I can pin down the root cause of
> that drop_cache hiccup.
> ---
>  common/btrfs | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/common/btrfs b/common/btrfs
> index 344509ce..1d522c33 100644
> --- a/common/btrfs
> +++ b/common/btrfs
> @@ -599,6 +599,11 @@ _btrfs_buffered_read_on_mirror()
>         local size=$5
>
>         echo 3 > /proc/sys/vm/drop_caches
> +       # Above drop_caches doesn't seem to drop every pages on aarch64 with
> +       # 64K page size.
> +       # So here as a workaround, cycle mount the SCRATCH_MNT to ensure
> +       # the cache are dropped.
> +       _scratch_cycle_mount

Btw, I'm getting some tests failing because of this change.

For e.g.:

./check btrfs/143
FSTYP         -- btrfs
PLATFORM      -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP
PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
MKFS_OPTIONS  -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1

btrfs/143 6s ... [failed, exit status 1]- output mismatch (see
/home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad)
    --- tests/btrfs/143.out 2020-06-10 19:29:03.818519162 +0100
    +++ /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad
2023-06-19 17:04:00.575033899 +0100
    @@ -1,37 +1,6 @@
     QA output created by 143
     wrote 131072/131072 bytes
     XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
    -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
................
    -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
................
    -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
................
    -XXXXXXXX:  aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
................
    ...
    (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/143.out
/home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad'  to see
the entire diff)
Ran: btrfs/143
Failures: btrfs/143
Failed 1 of 1 tests

The problem is this test is using dm-rust, and _scratch_cycle_mount()
doesn't work in that case.
It should be _unmount_dust() followed by_mount_dust() for this particular case.

>         while [[ -z $( (( BASHPID % nr_mirrors == mirror )) &&
>                 exec $XFS_IO_PROG \
>                         -c "pread -b $size $offset $size" $file) ]]; do
> --
> 2.39.1
>