On 8/16/23 7:31 PM, Qu Wenruo wrote: > > > On 2023/8/17 09:23, Jens Axboe wrote: >> On 8/16/23 7:19 PM, Qu Wenruo wrote: >>> On 2023/8/17 09:12, Jens Axboe wrote: >>>> On 8/16/23 7:05 PM, Qu Wenruo wrote: >>>>> >>>>> >>>>> On 2023/8/17 06:28, Jens Axboe wrote: >>>>> [...] >>>>>> >>>>>>>> 2) What's the .config you are using? >>>>>>> >>>>>>> Pretty common config, no heavy debug options (KASAN etc). >>>>>> >>>>>> Please just send the .config, I'd rather not have to guess. Things like >>>>>> preempt etc may make a difference in reproducing this. >>>>> >>>>> Sure, please see the attached config.gz >>>> >>>> Thanks >>>> >>>>>> And just to be sure, this is not mixing dio and buffered, right? >>>>> >>>>> I'd say it's mixing, there are dwrite() and writev() for the same file, >>>>> but at least not overlapping using this particular seed, nor they are >>>>> concurrent (all inside the same process sequentially). >>>>> >>>>> But considering if only uring_write is disabled, then no more reproduce, >>>>> thus there must be some untested btrfs path triggered by uring_write. >>>> >>>> That would be one conclusion, another would be that timing is just >>>> different and that triggers and issue. Or it could of course be a bug in >>>> io_uring, perhaps a short write that gets retried or something like >>>> that. I've run the tests for hours here and don't hit anything, I've >>>> pulled in the for-next branch for btrfs and see if that'll make a >>>> difference. I'll check your .config too. >>> >>> Just to mention, the problem itself was pretty hard to hit before if >>> using any debug kernel configs. >> >> The kernels I'm testing with don't have any debug options enabled, >> outside of the basic cheap stuff. I do notice you have all btrfs debug >> stuff enabled, I'll try and do that too. >> >>> Not sure why but later I switched both my CPUs (from a desktop i7-13700K >>> but with limited 160W power, to a laptop 7940HS), dropping all heavy >>> debug kernel configs, then it's 100% reproducible here. >>> >>> So I guess a faster CPU is also one factor? >> >> I've run this on kvm on an apple m1 max, no luck there. Ran it on a >> 7950X, no luck there. Fiddling config options on the 7950 and booting up >> the 7763 two socket box. Both that and the 7950 are using gen4 optane, >> should be plenty beefy. But if it's timing related, well... > > Just to mention, the following progs are involved: > > - btrfs-progs v6.3.3 > In theory anything newer than 5.15 should be fine, it's some default > settings change. axboe@r7525 ~> apt show btrfs-progs Package: btrfs-progs Version: 6.3.2-1 is what I have. > - fsstress from xfstests project > Thus it's not the one directly from LTP That's what I'm using too. > Hopes this could help you to reproduce the bug. So far, not really :-) -- Jens Axboe