Possible io_uring related race leads to btrfs data csum mismatch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Recently I'm digging into a very rare failure during btrfs/06[234567],
where btrfs scrub detects unrepairable data corruption.

After days of digging, I have a much smaller reproducer:

```
fail()
{
        echo "!!! FAILED !!!"
        exit 1
}

workload()
{
        mkfs.btrfs -f -m single -d single --csum sha256 $dev1
        mount $dev1 $mnt
	# There are around 10 more combinations with different
        # seed and -p/-n parameters, but this is the smallest one
	# I found so far.
	$fsstress -p 7 -n 50 -s 1691396493 -w -d $mnt
	umount $mnt
	btrfs check --check-data-csum $dev1 || fail
}
runtime=1024
for (( i = 0; i < $runtime; i++ )); do
        echo "=== $i / $runtime ==="
        workload
done
```

At least here, with a VM with 6 cores (host has 8C/16T), fast enough
storage (PCIE4.0 NVME, with unsafe cache mode), it has the chance around
1/100 to hit the error.

Checking the fsstress verbose log against the failed file, it turns out
to be an io_uring write.

And with uring_write disabled in fsstress, I have no longer reproduced
the csum mismatch, even with much larger -n and -p parameters.

However I didn't see any io_uring related callback inside btrfs code,
any advice on the io_uring part would be appreciated.

Thanks,
Qu




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux