Re: [PATCH 4/4] io_uring: flush task work before waiting for ring exit

Jens Axboe <axboe@xxxxxxxxx> · Tue, 7 Apr 2020 13:39:06 -0700

On 4/7/20 1:30 PM, Jens Axboe wrote:
> On 4/7/20 9:38 AM, Oleg Nesterov wrote:
>> On 04/07, Oleg Nesterov wrote:
>>>
>>> On 04/07, Jens Axboe wrote:
>>>>
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -7293,10 +7293,15 @@ static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
>>>>  		io_wq_cancel_all(ctx->io_wq);
>>>>
>>>>  	io_iopoll_reap_events(ctx);
>>>> +	idr_for_each(&ctx->personality_idr, io_remove_personalities, ctx);
>>>> +
>>>> +	if (current->task_works != &task_work_exited)
>>>> +		task_work_run();
>>>
>>> this is still wrong, please see the email I sent a minute ago.
>>
>> Let me try to explain in case it was not clear. Lets forget about io_uring.
>>
>> 	void bad_work_func(struct callback_head *cb)
>> 	{
>> 		task_work_run();
>> 	}
>>
>> 	...
>>
>> 	init_task_work(&my_work, bad_work_func);
>>
>> 	task_work_add(task, &my_work);
>>
>> If the "task" above is exiting the kernel will crash; because the 2nd
>> task_work_run() called by bad_work_func() will install work_exited, then
>> we return to task_work_run() which was called by exit_task_work(), it will
>> notice ->task_works != NULL, restart the main loop, and execute
>> work_exited->fn == NULL.
>>
>> Again, if we want to allow task_work_run() in do_exit() paths we need
>> something like below. But still do not understand why do we need this :/
> 
> The crash I sent was from the exit path, I don't think we need to run
> the task_work for that case, as the ordering should imply that we either
> queue the work with the task (if not exiting), and it'll get run just fine,
> or we queue it with another task. For both those cases, no need to run
> the local task work.
> 
> io_uring exit removes the pending poll requests, but what if (for non
> exit invocation), we get poll requests completing before they are torn
> down. Now we have task_work queued up that won't get run, because we
> are are in the task_work handler for the __fput(). For this case, we
> need to run the task work.
> 
> But I can't tell them apart easily, hence I don't know when it's safe
> to run it. That's what I'm trying to solve by exposing task_work_exited
> so I can check for that specifically. Not really a great solution as
> it doesn't tell me which of the cases I'm in, but at least it tells me
> if it's safe to run the task work?

It's also possible I totally mis-analyzed it, and it really is back to
"just" being an ordering issue than I then work-around by re-running the
task_work within the handler.

-- 
Jens Axboe