On 8/16/23 12:52 AM, Qu Wenruo wrote: > Hi, > > Recently I'm digging into a very rare failure during btrfs/06[234567], > where btrfs scrub detects unrepairable data corruption. > > After days of digging, I have a much smaller reproducer: > > ``` > fail() > { > echo "!!! FAILED !!!" > exit 1 > } > > workload() > { > mkfs.btrfs -f -m single -d single --csum sha256 $dev1 > mount $dev1 $mnt > # There are around 10 more combinations with different > # seed and -p/-n parameters, but this is the smallest one > # I found so far. > $fsstress -p 7 -n 50 -s 1691396493 -w -d $mnt > umount $mnt > btrfs check --check-data-csum $dev1 || fail > } > runtime=1024 > for (( i = 0; i < $runtime; i++ )); do > echo "=== $i / $runtime ===" > workload > done > ``` Tried to reproduce this, both on a vm and on a real host, and no luck so far. I've got a few followup questions as your report is missing some important info: 1) What kernel are you running? 2) What's the .config you are using? > At least here, with a VM with 6 cores (host has 8C/16T), fast enough > storage (PCIE4.0 NVME, with unsafe cache mode), it has the chance around > 1/100 to hit the error. What does "unsafe cche mode" mean? Is that write back caching enabled? Write back caching with volatile write cache? For your device, can you do: $ grep . /sys/block/$dev/queue/* > Checking the fsstress verbose log against the failed file, it turns out > to be an io_uring write. Any more details on what the write looks like? > And with uring_write disabled in fsstress, I have no longer reproduced > the csum mismatch, even with much larger -n and -p parameters. Is it more likely to reproduce with larger -n/-p in general? > However I didn't see any io_uring related callback inside btrfs code, > any advice on the io_uring part would be appreciated. io_uring doesn't do anything special here, it uses the normal page cache read/write parts for buffered IO. But you may get extra parallellism with io_uring here. For example, with the buffered write that this most likely is, libaio would be exactly the same as a pwrite(2) on the file. If this would've blocked, io_uring would offload this to a helper thread. Depending on the workload, you could have multiple of those in progress at the same time. -- Jens Axboe