Re: (subset) [PATCH 00/11] remove aux CQE caches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/17/24 3:47 PM, Pavel Begunkov wrote:
> On 3/17/24 21:34, Pavel Begunkov wrote:
>> On 3/17/24 21:32, Jens Axboe wrote:
>>> On 3/17/24 3:29 PM, Pavel Begunkov wrote:
>>>> On 3/17/24 21:24, Jens Axboe wrote:
>>>>> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
>>>>>> On 3/16/24 13:56, Ming Lei wrote:
>>>>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>>>>>>>> On 3/16/24 11:52, Ming Lei wrote:
>>>>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>>> The following two error can be triggered with this patchset
>>>>>>>>> when running some ublk stress test(io vs. deletion). And not see
>>>>>>>>> such failures after reverting the 11 patches.
>>>>>>>>
>>>>>>>> I suppose it's with the fix from yesterday. How can I
>>>>>>>> reproduce it, blktests?
>>>>>>>
>>>>>>> Yeah, it needs yesterday's fix.
>>>>>>>
>>>>>>> You may need to run this test multiple times for triggering the problem:
>>>>>>
>>>>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
>>>>>> in userspace waiting for CQEs but no complaints from the kernel.
>>>>>> However, it seems the branch is buggy even without my patches, I
>>>>>> consistently (5-15 minutes of running in a slow VM) hit page underflow
>>>>>> by running liburing tests. Not sure what is that yet, but might also
>>>>>> be the reason.
>>>>>
>>>>> Hmm odd, there's nothing in there but your series and then the
>>>>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
>>>>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
>>>>> haven't seen anything odd.
>>>>
>>>> Need to test io_uring-6.9. I actually checked the branch twice, both
>>>> with the issue, and by full recompilation and config prompts I assumed
>>>> you pulled something in between (maybe not).
>>>>
>>>> And yeah, I can't confirm it's specifically an io_uring bug, the
>>>> stack trace is usually some unmap or task exit, sometimes it only
>>>> shows when you try to shutdown the VM after tests.
>>>
>>> Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv
>>> test case as well on io_uring-6.9 and it all worked fine. Trying
>>> liburing tests on for-6.10/io_uring as well now, but didn't see anything
>>> the other times I ran it. In any case, once you repost I'll rebase and
>>> then let's see if it hits again.
>>>
>>> Did you run with KASAN enabled
>>
>> Yes, it's a debug kernel, full on KASANs, lockdeps and so
> 
> And another note, I triggered it once (IIRC on shutdown) with ublk
> tests only w/o liburing/tests, likely limits it to either the core
> io_uring infra or non-io_uring bugs.

Been running on for-6.10/io_uring, and the only odd thing I see is that
the test output tends to stall here:

Running test read-before-exit.t

which then either leads to a connection disconnect from my ssh into that
vm, or just a long delay and then it picks up again. This did not happen
with io_uring-6.9.

Maybe related? At least it's something new. Just checked again, and yeah
it seems to totally lock up the vm while that is running. Will try a
quick bisect of that series.

-- 
Jens Axboe





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux