On Thu, Jun 8, 2023 at 1:02 PM Qu Wenruo <wqu@xxxxxxxx> wrote: > > [BUG] > There is a chance that btrfs/266 would fail on aarch64 with 64K page > size. (No matter if it's 4K sector size or 64K sector size) > > The failure indicates that one or more mirrors are not properly fixed. > > [CAUSE] > I have added some trace events into the btrfs IO paths, including > __btrfs_submit_bio() and __end_bio_extent_readpage(). > > When the test failed, the trace looks like this: > > 112819.764977: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=196608 pid=33663 > ^^^ Initial read on the full 192K file > 112819.766604: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=2 len=65536 pid=21806 > ^^^ Repair on the first 64K block > Which would success > 112819.766760: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=2 len=65536 pid=21806 > ^^^ Repair on the second 64K block > Which would fail > 112819.767625: __btrfs_submit_bio: r/i=5/257 fileoff=65536 mirror=3 len=65536 pid=21806 > ^^^ Repair on the third 64K block > Which would success > 112819.770999: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=1 status=0 > ^^^ The repair succeeded, the > full 192K read finished. > > 112819.864851: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=3 len=196608 pid=33665 > 112819.874854: __btrfs_submit_bio: r/i=5/257 fileoff=0 mirror=1 len=65536 pid=31369 > 112819.875029: __btrfs_submit_bio: r/i=5/257 fileoff=131072 mirror=1 len=65536 pid=31369 > 112819.876926: end_bio_extent_readpage: read finished, r/i=5/257 fileoff=0 len=196608 mirror=3 status=0 > > But above read only happen for mirror 1 and mirror 3, mirror 2 is not > involved. > This means by somehow, the read on mirror 2 didn't happen, mostly > due to something wrong during the drop_caches call. > > It may be an async operation or some hardware specific behavior. > > On the other hand, for test cases doing the same operation but utilizing > direct IO, aka btrfs/267, it never fails as we never populate the page > cache thus would always read from the disk. > > [WORKAROUND] > The root cause is in the "echo 3 > drop_caches", which I'm still > debugging. > > But at the same time, we can workaround the problem by forcing a > cycle mount of scratch device, inside _btrfs_buffered_read_on_mirror(). > > By this we can ensure all page caches are dropped no matter if > drop_caches is working correctly. > > With this patch, I no longer hit the failure on aarch64 with 64K page > size anymore, while before this the test case would always fail during a > -I 10 run. > > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > --- > RFC: > The root cause is still inside that "echo 3 > drop_caches", but I don't > have any better solution if such fundamental debug function is not > working as expected. > > Thus this is only a workaround, before I can pin down the root cause of > that drop_cache hiccup. > --- > common/btrfs | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/common/btrfs b/common/btrfs > index 344509ce..1d522c33 100644 > --- a/common/btrfs > +++ b/common/btrfs > @@ -599,6 +599,11 @@ _btrfs_buffered_read_on_mirror() > local size=$5 > > echo 3 > /proc/sys/vm/drop_caches > + # Above drop_caches doesn't seem to drop every pages on aarch64 with > + # 64K page size. > + # So here as a workaround, cycle mount the SCRATCH_MNT to ensure > + # the cache are dropped. > + _scratch_cycle_mount Btw, I'm getting some tests failing because of this change. For e.g.: ./check btrfs/143 FSTYP -- btrfs PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023 MKFS_OPTIONS -- /dev/sdc MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 btrfs/143 6s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad) --- tests/btrfs/143.out 2020-06-10 19:29:03.818519162 +0100 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad 2023-06-19 17:04:00.575033899 +0100 @@ -1,37 +1,6 @@ QA output created by 143 wrote 131072/131072 bytes XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ -XXXXXXXX: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ ... (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/143.out /home/fdmanana/git/hub/xfstests/results//btrfs/143.out.bad' to see the entire diff) Ran: btrfs/143 Failures: btrfs/143 Failed 1 of 1 tests The problem is this test is using dm-rust, and _scratch_cycle_mount() doesn't work in that case. It should be _unmount_dust() followed by_mount_dust() for this particular case. > while [[ -z $( (( BASHPID % nr_mirrors == mirror )) && > exec $XFS_IO_PROG \ > -c "pread -b $size $offset $size" $file) ]]; do > -- > 2.39.1 >