Re: [PATCH 2/4] task_work: don't run task_work if task_work_exited is queued

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/7/20 9:19 AM, Oleg Nesterov wrote:
> On 04/07, Jens Axboe wrote:
>>
>> On 4/7/20 4:39 AM, Oleg Nesterov wrote:
>>>
>>> IIUC, this is needed for the next change which adds task_work_run() into
>>> io_ring_ctx_wait_and_kill(), right?
>>
>> Right - so you'd rather I localize that check there instead? Can certainly
>> do that.
> 
> I am still not sure we need this check at all... probably this is because
> I don't understand the problem.

Probably because I'm not explaining it very well... Let me try. io_uring
uses the task_work to handle completion of poll requests. Either an
explicit poll, or one done implicitly because someone did a recv/send
(or whatever) on a socket that wasn't ready. When we get the poll
waitqueue callback, we queue up task_work to handle the completion of
it.

These can come in at any time, obviously, as space or data becomes
available. If the task is exiting, our task_work_add() fails, and we
queue with someone else. But there seems to be a case where it does get
queued locally, and then io_uring doesn't know if it's safe to run
task_work or not. Sometimes that manifests itself in hitting the RIP ==
0 case that I included here. With the work_pending && work != exit_work
in place, it works fine.

>>> could you explain how the exiting can call io_ring_ctx_wait_and_kill()
>>> after it passed exit_task_work() ?
>>
>> Sure, here's a trace where it happens:
> 
> but this task has not passed exit_task_work(),

But it's definitely hitting callback->func == NULL, which is the
exit_work. So if it's not from past exit_task_work(), where is it from?

> 
>>  __task_work_run+0x66/0xa0
>>  io_ring_ctx_wait_and_kill+0x14e/0x3c0
>>  io_uring_release+0x1c/0x20
>>  __fput+0xaa/0x200
>>  __task_work_run+0x66/0xa0
>>  do_exit+0x9cf/0xb40
> 
> So task_work_run() is called recursively from
> exit_task_work()->task_work_run().  See my another email, this is
> wrong with or without this series. And that is why I think
> task_work_run() hits work_exited.

I see your newer email on this, I'll go read it.

> Could you explain why io_ring_ctx_wait_and_kill() needs
> task_work_run() ?

Hopefully the above helped! If I'm way off somehow, cluebat away.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux