On 3/17/24 3:29 PM, Pavel Begunkov wrote: > On 3/17/24 21:24, Jens Axboe wrote: >> On 3/17/24 2:55 PM, Pavel Begunkov wrote: >>> On 3/16/24 13:56, Ming Lei wrote: >>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote: >>>>> On 3/16/24 11:52, Ming Lei wrote: >>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote: >>>> >>>> ... >>>> >>>>>> The following two error can be triggered with this patchset >>>>>> when running some ublk stress test(io vs. deletion). And not see >>>>>> such failures after reverting the 11 patches. >>>>> >>>>> I suppose it's with the fix from yesterday. How can I >>>>> reproduce it, blktests? >>>> >>>> Yeah, it needs yesterday's fix. >>>> >>>> You may need to run this test multiple times for triggering the problem: >>> >>> Thanks for all the testing. I've tried it, all ublk/generic tests hang >>> in userspace waiting for CQEs but no complaints from the kernel. >>> However, it seems the branch is buggy even without my patches, I >>> consistently (5-15 minutes of running in a slow VM) hit page underflow >>> by running liburing tests. Not sure what is that yet, but might also >>> be the reason. >> >> Hmm odd, there's nothing in there but your series and then the >> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the >> merge window -git cycle? Does it happen with io_uring-6.9 as well? I >> haven't seen anything odd. > > Need to test io_uring-6.9. I actually checked the branch twice, both > with the issue, and by full recompilation and config prompts I assumed > you pulled something in between (maybe not). > > And yeah, I can't confirm it's specifically an io_uring bug, the > stack trace is usually some unmap or task exit, sometimes it only > shows when you try to shutdown the VM after tests. Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv test case as well on io_uring-6.9 and it all worked fine. Trying liburing tests on for-6.10/io_uring as well now, but didn't see anything the other times I ran it. In any case, once you repost I'll rebase and then let's see if it hits again. Did you run with KASAN enabled? -- Jens Axboe