Re: Possible io_uring related race leads to btrfs data csum mismatch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On 8/16/23 12:52 AM, Qu Wenruo wrote:
> Hi,
> Recently I'm digging into a very rare failure during btrfs/06[234567],
> where btrfs scrub detects unrepairable data corruption.
> After days of digging, I have a much smaller reproducer:
> ```
> fail()
> {
>         echo "!!! FAILED !!!"
>         exit 1
> }
> workload()
> {
>         mkfs.btrfs -f -m single -d single --csum sha256 $dev1
>         mount $dev1 $mnt
>     # There are around 10 more combinations with different
>         # seed and -p/-n parameters, but this is the smallest one
>     # I found so far.
>     $fsstress -p 7 -n 50 -s 1691396493 -w -d $mnt
>     umount $mnt
>     btrfs check --check-data-csum $dev1 || fail
> }
> runtime=1024
> for (( i = 0; i < $runtime; i++ )); do
>         echo "=== $i / $runtime ==="
>         workload
> done
> ```

Tried to reproduce this, both on a vm and on a real host, and no luck so
far. I've got a few followup questions as your report is missing some
important info:

1) What kernel are you running?
2) What's the .config you are using?

> At least here, with a VM with 6 cores (host has 8C/16T), fast enough
> storage (PCIE4.0 NVME, with unsafe cache mode), it has the chance around
> 1/100 to hit the error.

What does "unsafe cche mode" mean? Is that write back caching enabled?
Write back caching with volatile write cache? For your device, can you

$ grep . /sys/block/$dev/queue/*

> Checking the fsstress verbose log against the failed file, it turns out
> to be an io_uring write.

Any more details on what the write looks like?

> And with uring_write disabled in fsstress, I have no longer reproduced
> the csum mismatch, even with much larger -n and -p parameters.

Is it more likely to reproduce with larger -n/-p in general?

> However I didn't see any io_uring related callback inside btrfs code,
> any advice on the io_uring part would be appreciated.

io_uring doesn't do anything special here, it uses the normal page cache
read/write parts for buffered IO. But you may get extra parallellism
with io_uring here. For example, with the buffered write that this most
likely is, libaio would be exactly the same as a pwrite(2) on the file.
If this would've blocked, io_uring would offload this to a helper
thread. Depending on the workload, you could have multiple of those in
progress at the same time.

Jens Axboe

[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux