On 3/17/24 3:47 PM, Pavel Begunkov wrote: > On 3/17/24 21:34, Pavel Begunkov wrote: >> On 3/17/24 21:32, Jens Axboe wrote: >>> On 3/17/24 3:29 PM, Pavel Begunkov wrote: >>>> On 3/17/24 21:24, Jens Axboe wrote: >>>>> On 3/17/24 2:55 PM, Pavel Begunkov wrote: >>>>>> On 3/16/24 13:56, Ming Lei wrote: >>>>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote: >>>>>>>> On 3/16/24 11:52, Ming Lei wrote: >>>>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote: >>>>>>> >>>>>>> ... >>>>>>> >>>>>>>>> The following two error can be triggered with this patchset >>>>>>>>> when running some ublk stress test(io vs. deletion). And not see >>>>>>>>> such failures after reverting the 11 patches. >>>>>>>> >>>>>>>> I suppose it's with the fix from yesterday. How can I >>>>>>>> reproduce it, blktests? >>>>>>> >>>>>>> Yeah, it needs yesterday's fix. >>>>>>> >>>>>>> You may need to run this test multiple times for triggering the problem: >>>>>> >>>>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang >>>>>> in userspace waiting for CQEs but no complaints from the kernel. >>>>>> However, it seems the branch is buggy even without my patches, I >>>>>> consistently (5-15 minutes of running in a slow VM) hit page underflow >>>>>> by running liburing tests. Not sure what is that yet, but might also >>>>>> be the reason. >>>>> >>>>> Hmm odd, there's nothing in there but your series and then the >>>>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the >>>>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I >>>>> haven't seen anything odd. >>>> >>>> Need to test io_uring-6.9. I actually checked the branch twice, both >>>> with the issue, and by full recompilation and config prompts I assumed >>>> you pulled something in between (maybe not). >>>> >>>> And yeah, I can't confirm it's specifically an io_uring bug, the >>>> stack trace is usually some unmap or task exit, sometimes it only >>>> shows when you try to shutdown the VM after tests. >>> >>> Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv >>> test case as well on io_uring-6.9 and it all worked fine. Trying >>> liburing tests on for-6.10/io_uring as well now, but didn't see anything >>> the other times I ran it. In any case, once you repost I'll rebase and >>> then let's see if it hits again. >>> >>> Did you run with KASAN enabled >> >> Yes, it's a debug kernel, full on KASANs, lockdeps and so > > And another note, I triggered it once (IIRC on shutdown) with ublk > tests only w/o liburing/tests, likely limits it to either the core > io_uring infra or non-io_uring bugs. Been running on for-6.10/io_uring, and the only odd thing I see is that the test output tends to stall here: Running test read-before-exit.t which then either leads to a connection disconnect from my ssh into that vm, or just a long delay and then it picks up again. This did not happen with io_uring-6.9. Maybe related? At least it's something new. Just checked again, and yeah it seems to totally lock up the vm while that is running. Will try a quick bisect of that series. -- Jens Axboe