Re: Possible io_uring related race leads to btrfs data csum mismatch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On 8/16/23 7:31 PM, Qu Wenruo wrote:
> On 2023/8/17 09:23, Jens Axboe wrote:
>> On 8/16/23 7:19 PM, Qu Wenruo wrote:
>>> On 2023/8/17 09:12, Jens Axboe wrote:
>>>> On 8/16/23 7:05 PM, Qu Wenruo wrote:
>>>>> On 2023/8/17 06:28, Jens Axboe wrote:
>>>>> [...]
>>>>>>>> 2) What's the .config you are using?
>>>>>>> Pretty common config, no heavy debug options (KASAN etc).
>>>>>> Please just send the .config, I'd rather not have to guess. Things like
>>>>>> preempt etc may make a difference in reproducing this.
>>>>> Sure, please see the attached config.gz
>>>> Thanks
>>>>>> And just to be sure, this is not mixing dio and buffered, right?
>>>>> I'd say it's mixing, there are dwrite() and writev() for the same file,
>>>>> but at least not overlapping using this particular seed, nor they are
>>>>> concurrent (all inside the same process sequentially).
>>>>> But considering if only uring_write is disabled, then no more reproduce,
>>>>> thus there must be some untested btrfs path triggered by uring_write.
>>>> That would be one conclusion, another would be that timing is just
>>>> different and that triggers and issue. Or it could of course be a bug in
>>>> io_uring, perhaps a short write that gets retried or something like
>>>> that. I've run the tests for hours here and don't hit anything, I've
>>>> pulled in the for-next branch for btrfs and see if that'll make a
>>>> difference. I'll check your .config too.
>>> Just to mention, the problem itself was pretty hard to hit before if
>>> using any debug kernel configs.
>> The kernels I'm testing with don't have any debug options enabled,
>> outside of the basic cheap stuff. I do notice you have all btrfs debug
>> stuff enabled, I'll try and do that too.
>>> Not sure why but later I switched both my CPUs (from a desktop i7-13700K
>>> but with limited 160W power, to a laptop 7940HS), dropping all heavy
>>> debug kernel configs, then it's 100% reproducible here.
>>> So I guess a faster CPU is also one factor?
>> I've run this on kvm on an apple m1 max, no luck there. Ran it on a
>> 7950X, no luck there. Fiddling config options on the 7950 and booting up
>> the 7763 two socket box. Both that and the 7950 are using gen4 optane,
>> should be plenty beefy. But if it's timing related, well...
> Just to mention, the following progs are involved:
> - btrfs-progs v6.3.3
>   In theory anything newer than 5.15 should be fine, it's some default
>   settings change.

axboe@r7525 ~> apt show btrfs-progs
Package: btrfs-progs
Version: 6.3.2-1

is what I have.

> - fsstress from xfstests project
>   Thus it's not the one directly from LTP

That's what I'm using too.

> Hopes this could help you to reproduce the bug.

So far, not really :-)

Jens Axboe

[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux