Re: (subset) [PATCH 00/11] remove aux CQE caches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/17/24 3:51 PM, Jens Axboe wrote:
> On 3/17/24 3:47 PM, Pavel Begunkov wrote:
>> On 3/17/24 21:34, Pavel Begunkov wrote:
>>> On 3/17/24 21:32, Jens Axboe wrote:
>>>> On 3/17/24 3:29 PM, Pavel Begunkov wrote:
>>>>> On 3/17/24 21:24, Jens Axboe wrote:
>>>>>> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
>>>>>>> On 3/16/24 13:56, Ming Lei wrote:
>>>>>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>>>>>>>>> On 3/16/24 11:52, Ming Lei wrote:
>>>>>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>>> The following two error can be triggered with this patchset
>>>>>>>>>> when running some ublk stress test(io vs. deletion). And not see
>>>>>>>>>> such failures after reverting the 11 patches.
>>>>>>>>>
>>>>>>>>> I suppose it's with the fix from yesterday. How can I
>>>>>>>>> reproduce it, blktests?
>>>>>>>>
>>>>>>>> Yeah, it needs yesterday's fix.
>>>>>>>>
>>>>>>>> You may need to run this test multiple times for triggering the problem:
>>>>>>>
>>>>>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
>>>>>>> in userspace waiting for CQEs but no complaints from the kernel.
>>>>>>> However, it seems the branch is buggy even without my patches, I
>>>>>>> consistently (5-15 minutes of running in a slow VM) hit page underflow
>>>>>>> by running liburing tests. Not sure what is that yet, but might also
>>>>>>> be the reason.
>>>>>>
>>>>>> Hmm odd, there's nothing in there but your series and then the
>>>>>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
>>>>>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
>>>>>> haven't seen anything odd.
>>>>>
>>>>> Need to test io_uring-6.9. I actually checked the branch twice, both
>>>>> with the issue, and by full recompilation and config prompts I assumed
>>>>> you pulled something in between (maybe not).
>>>>>
>>>>> And yeah, I can't confirm it's specifically an io_uring bug, the
>>>>> stack trace is usually some unmap or task exit, sometimes it only
>>>>> shows when you try to shutdown the VM after tests.
>>>>
>>>> Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv
>>>> test case as well on io_uring-6.9 and it all worked fine. Trying
>>>> liburing tests on for-6.10/io_uring as well now, but didn't see anything
>>>> the other times I ran it. In any case, once you repost I'll rebase and
>>>> then let's see if it hits again.
>>>>
>>>> Did you run with KASAN enabled
>>>
>>> Yes, it's a debug kernel, full on KASANs, lockdeps and so
>>
>> And another note, I triggered it once (IIRC on shutdown) with ublk
>> tests only w/o liburing/tests, likely limits it to either the core
>> io_uring infra or non-io_uring bugs.
> 
> Been running on for-6.10/io_uring, and the only odd thing I see is that
> the test output tends to stall here:
> 
> Running test read-before-exit.t
> 
> which then either leads to a connection disconnect from my ssh into that
> vm, or just a long delay and then it picks up again. This did not happen
> with io_uring-6.9.
> 
> Maybe related? At least it's something new. Just checked again, and yeah
> it seems to totally lock up the vm while that is running. Will try a
> quick bisect of that series.

Seems to be triggered by the top of branch patch in there, my poll and
timeout special casing. While the above test case runs with that commit,
it'll freeze the host.

-- 
Jens Axboe





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux