On Tue, Jun 20, 2023 at 2:07 AM Qu Wenruo <wqu@xxxxxxxx> wrote: > > > > On 2023/6/20 00:13, Filipe Manana wrote: > > On Thu, Jun 8, 2023 at 1:02 PM Qu Wenruo <wqu@xxxxxxxx> wrote: > >> > >> [BUG] > >> There is a chance that btrfs/266 would fail on aarch64 with 64K page > >> size. (No matter if it's 4K sector size or 64K sector size) > >> > >> The failure indicates that one or more mirrors are not properly fixed. > >> > >> [CAUSE] > >> I have added some trace events into the btrfs IO paths, including > >> __btrfs_submit_bio() and __end_bio_extent_readpage(). > >> > >> When the test failed, the trace looks like this: > >> > >> 112819.764977: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=196608 pid=33663 > >> ^^^ Initial read on the full 192K file > >> 112819.766604: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=2 len=65536 pid=21806 > >> ^^^ Repair on the first 64K block > >> Which would success > >> 112819.766760: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=2 len=65536 pid=21806 > >> ^^^ Repair on the second 64K block > >> Which would fail > >> 112819.767625: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=3 len=65536 pid=21806 > >> ^^^ Repair on the third 64K block > >> Which would success > >> 112819.770999: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=1 status=0 > >> ^^^ The repair succeeded, the > >> full 192K read finished. > >> > >> 112819.864851: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=3 len=196608 pid=33665 > >> 112819.874854: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=65536 pid=31369 > >> 112819.875029: __btrfs_submit_bio: r/i=5/257 fileoff=131072 mirror=1 len=65536 pid=31369 > >> 112819.876926: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=3 status=0 > >> > >> But above read only happen for mirror 1 and mirror 3, mirror 2 is not > >> involved. > >> This means by somehow, the read on mirror 2 didn't happen, mostly > >> due to something wrong during the drop_caches call. > >> > >> It may be an async operation or some hardware specific behavior. > >> > >> On the other hand, for test cases doing the same operation but utilizing > >> direct IO, aka btrfs/267, it never fails as we never populate the page > >> cache thus would always read from the disk. > >> > >> [WORKAROUND] > >> The root cause is in the "echo 3 > drop_caches", which I'm still > >> debugging. > >> > >> But at the same time, we can workaround the problem by forcing a > >> cycle mount of scratch device, inside _btrfs_buffered_read_on_mirror(). > >> > >> By this we can ensure all page caches are dropped no matter if > >> drop_caches is working correctly. > >> > >> With this patch, I no longer hit the failure on aarch64 with 64K page > >> size anymore, while before this the test case would always fail during a > >> -I 10 run. > >> > >> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > >> --- > >> RFC: > >> The root cause is still inside that "echo 3 > drop_caches", but I don't > >> have any better solution if such fundamental debug function is not > >> working as expected. > >> > >> Thus this is only a workaround, before I can pin down the root cause of > >> that drop_cache hiccup. > >> --- > >> common/btrfs | 5 +++++ > >> 1 file changed, 5 insertions(+) > >> > >> diff --git a/common/btrfs b/common/btrfs > >> index 344509ce..1d522c33 100644 > >> --- a/common/btrfs > >> +++ b/common/btrfs > >> @@ -599,6 +599,11 @@ _btrfs_buffered_read_on_mirror() > >> local size=$5 > >> > >> echo 3 > /proc/sys/vm/drop_caches > >> + # Above drop_caches doesn't seem to drop every pages on aarch64 with > >> + # 64K page size. > >> + # So here as a workaround, cycle mount the SCRATCH_MNT to ensure > >> + # the cache are dropped. > >> + _scratch_cycle_mount > > > > Btw, I'm getting some tests failing because of this change. > > > > For e.g.: > > > > ./check btrfs/143 > > FSTYP -- btrfs > > PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP > > PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023 > > MKFS_OPTIONS -- /dev/sdc > > MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 > > > > btrfs/143 6s ... [failed, exit status 1]- output mismatch (see > > /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad) > > --- tests/btrfs/143.out 2020-06-10 19:29:03.818519162 +0100 > > +++ /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad > > 2023-06-19 17:04:00.575033899 +0100 > > @@ -1,37 +1,6 @@ > > QA output created by 143 > > wrote 131072/131072 bytes > > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > > -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > > ................ > > -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > > ................ > > -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > > ................ > > -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa > > ................ > > ... > > (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/143.out > > /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad' to see > > the entire diff) > > Ran: btrfs/143 > > Failures: btrfs/143 > > Failed 1 of 1 tests > > > > The problem is this test is using dm-rust, and _scratch_cycle_mount() > > doesn't work in that case. > > It should be _unmount_dust() followed by_mount_dust() for this particular case. > > Any idea on a better solution? > > My current idea is to grab the mounted device of SCRATCH_MNT, then > unmount and mount the grabbed device, instead of always using scratch > device. Maybe that, yes. > > But it may look a little over-complicated and would affect too many test > cases. > > Or maybe we can detect if it's using dust device inside > _btrfs_buffered_read_on_mirror() instead? Well, that sounds fragile. Maybe there are other tests using dmflakey instead of dmrust, or some other target. Or maybe not now, but in the future... > > Thanks, > Qu > > > > > >> while [[ -z $( (( BASHPID % nr_mirrors == mirror )) && > >> exec $XFS_IO_PROG \ > >> -c "pread -b $size $offset $size" $file) ]]; do > >> -- > >> 2.39.1 > >>