Re: Possible io_uring related race leads to btrfs data csum mismatch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On 8/16/23 7:05 PM, Qu Wenruo wrote:
> On 2023/8/17 06:28, Jens Axboe wrote:
> [...]
>>>> 2) What's the .config you are using?
>>> Pretty common config, no heavy debug options (KASAN etc).
>> Please just send the .config, I'd rather not have to guess. Things like
>> preempt etc may make a difference in reproducing this.
> Sure, please see the attached config.gz


>> And just to be sure, this is not mixing dio and buffered, right?
> I'd say it's mixing, there are dwrite() and writev() for the same file,
> but at least not overlapping using this particular seed, nor they are
> concurrent (all inside the same process sequentially).
> But considering if only uring_write is disabled, then no more reproduce,
> thus there must be some untested btrfs path triggered by uring_write.

That would be one conclusion, another would be that timing is just
different and that triggers and issue. Or it could of course be a bug in
io_uring, perhaps a short write that gets retried or something like
that. I've run the tests for hours here and don't hit anything, I've
pulled in the for-next branch for btrfs and see if that'll make a
difference. I'll check your .config too.

Might not be a bad idea to have the writes contain known data, and when
you hit the failure to verify the csum, dump the data where the csum
says it's wrong and figure out at what offset, what content, etc it is?
If that can get correlated to the log of what happened, that might shed
some light on this.

>>>>> However I didn't see any io_uring related callback inside btrfs code,
>>>>> any advice on the io_uring part would be appreciated.
>>>> io_uring doesn't do anything special here, it uses the normal page cache
>>>> read/write parts for buffered IO. But you may get extra parallellism
>>>> with io_uring here. For example, with the buffered write that this most
>>>> likely is, libaio would be exactly the same as a pwrite(2) on the file.
>>>> If this would've blocked, io_uring would offload this to a helper
>>>> thread. Depending on the workload, you could have multiple of those in
>>>> progress at the same time.
>>> My biggest concern is, would io_uring modify the page when it's still
>>> under writeback?
>> No, of course not. Like I mentioned, io_uring doesn't do anything that
>> the normal read/write path isn't already doing - it's using the same
>> ->read_iter() and ->write_iter() that everything else is, there's no
>> page cache code in io_uring.
>>> In that case, it's going to cause csum mismatch as btrfs relies on the
>>> page under writeback to be unchanged.
>> Sure, I'm aware of the stable page requirements.
>> See my followup email as well on a patch to test as well.
> Applied and tested, using "-p 10 -n 1000" as fsstress workload, failed
> at 23rd run.

OK, that rules out the multiple-writers theory.

Jens Axboe

[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux