On 4/7/20 9:19 AM, Oleg Nesterov wrote: > On 04/07, Jens Axboe wrote: >> >> On 4/7/20 4:39 AM, Oleg Nesterov wrote: >>> >>> IIUC, this is needed for the next change which adds task_work_run() into >>> io_ring_ctx_wait_and_kill(), right? >> >> Right - so you'd rather I localize that check there instead? Can certainly >> do that. > > I am still not sure we need this check at all... probably this is because > I don't understand the problem. Probably because I'm not explaining it very well... Let me try. io_uring uses the task_work to handle completion of poll requests. Either an explicit poll, or one done implicitly because someone did a recv/send (or whatever) on a socket that wasn't ready. When we get the poll waitqueue callback, we queue up task_work to handle the completion of it. These can come in at any time, obviously, as space or data becomes available. If the task is exiting, our task_work_add() fails, and we queue with someone else. But there seems to be a case where it does get queued locally, and then io_uring doesn't know if it's safe to run task_work or not. Sometimes that manifests itself in hitting the RIP == 0 case that I included here. With the work_pending && work != exit_work in place, it works fine. >>> could you explain how the exiting can call io_ring_ctx_wait_and_kill() >>> after it passed exit_task_work() ? >> >> Sure, here's a trace where it happens: > > but this task has not passed exit_task_work(), But it's definitely hitting callback->func == NULL, which is the exit_work. So if it's not from past exit_task_work(), where is it from? > >> __task_work_run+0x66/0xa0 >> io_ring_ctx_wait_and_kill+0x14e/0x3c0 >> io_uring_release+0x1c/0x20 >> __fput+0xaa/0x200 >> __task_work_run+0x66/0xa0 >> do_exit+0x9cf/0xb40 > > So task_work_run() is called recursively from > exit_task_work()->task_work_run(). See my another email, this is > wrong with or without this series. And that is why I think > task_work_run() hits work_exited. I see your newer email on this, I'll go read it. > Could you explain why io_ring_ctx_wait_and_kill() needs > task_work_run() ? Hopefully the above helped! If I'm way off somehow, cluebat away. -- Jens Axboe