Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/4/24 8:34 AM, Pavel Begunkov wrote:
> On 11/4/24 15:27, Pavel Begunkov wrote:
>> On 11/4/24 15:08, Jens Axboe wrote:
>>> On 11/4/24 6:13 AM, Pavel Begunkov wrote:
>>>> On 11/4/24 11:31, syzbot wrote:
>>>>> syzbot has bisected this issue to:
>>>>>
>>>>> commit 3f1a546444738b21a8c312a4b49dc168b65c8706
>>>>> Author: Jens Axboe <axboe@xxxxxxxxx>
>>>>> Date:   Sat Oct 26 01:27:39 2024 +0000
>>>>>
>>>>>       io_uring/rsrc: get rid of per-ring io_rsrc_node list
>>>>>
>>>>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15aaa1f7980000
>>>>> start commit:   c88416ba074a Add linux-next specific files for 20241101
>>>>> git tree:       linux-next
>>>>> final oops:     https://syzkaller.appspot.com/x/report.txt?x=17aaa1f7980000
>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13aaa1f7980000
>>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=704b6be2ac2f205f
>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e333341d3d985e5173b2
>>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16ec06a7980000
>>>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12c04740580000
>>>>>
>>>>> Reported-by: syzbot+e333341d3d985e5173b2@xxxxxxxxxxxxxxxxxxxxxxxxx
>>>>> Fixes: 3f1a54644473 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list")
>>>>>
>>>>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>>>>
>>>> Previously all puts were done by requests, which in case of an exiting
>>>> ring were fallback'ed to normal tw. Now, the unregister path posts CQEs,
>>>> while the original task is still alive. Should be fine in general because
>>>> at this point there could be no requests posting in parallel and all
>>>> is synchronised, so it's a false positive, but we need to change the assert
>>>> or something else.
>>>
>>> Maybe something ala the below? Also changes these triggers to be
>>> _once(), no point spamming them.
>>>
>>> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
>>> index 00409505bf07..7792ed91469b 100644
>>> --- a/io_uring/io_uring.h
>>> +++ b/io_uring/io_uring.h
>>> @@ -137,10 +137,11 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
>>>            * Not from an SQE, as those cannot be submitted, but via
>>>            * updating tagged resources.
>>>            */
>>> -        if (ctx->submitter_task->flags & PF_EXITING)
>>> -            lockdep_assert(current_work());
>>> +        if (ctx->submitter_task->flags & PF_EXITING ||
>>> +            percpu_ref_is_dying(&ctx->refs))
>>
>> io_move_task_work_from_local() executes requests with a normal
>> task_work of a possible alive task, which which will the check.
>>
>> I was thinking to kill the extra step as it doesn't make sense,
>> git garbage digging shows the patch below, but I don't remember
>> if it has ever been tested.
>>
>>
>> commit 65560732da185c85f472e9c94e6b8ff147fc4b96
>> Author: Pavel Begunkov <asml.silence@xxxxxxxxx>
>> Date:   Fri Jun 7 13:13:06 2024 +0100
>>
>>      io_uring: skip normal tw with DEFER_TASKRUN
>>      DEFER_TASKRUN execution first falls back to normal task_work and only
>>      then, when the task is dying, to workers. It's cleaner to remove the
>>      middle step and use workers as the only fallback. It also detaches
>>      DEFER_TASKRUN and normal task_work handling from each other.
>>      Signed-off-by: Pavel Begunkov <asml.silence@xxxxxxxxx>
> 
> Not sure what spacing got broken here.
> 
> Regardless, the rule with sth like that should be simpler,
> i.e. a ctx is getting killed => everything is run from fallback/kthread.

I like it, and now there's another reason to do it. Can you out the
patch?

-- 
Jens Axboe




[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux