Re: [PATCH 4/4] io_uring: flush task work before waiting for ring exit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/8/20 11:40 AM, Oleg Nesterov wrote:
> Jens, I am sorry. I tried to understand your explanations but I can't :/
> Just in case, I know nothing about io_uring.
> 
> However, I strongly believe that
> 
> 	- the "task_work_exited" check in 4/4 can't help, the kernel
> 	  will crash anyway if a task-work callback runs with
> 	  current->task_works == &task_work_exited.
> 
> 	- this check is not needed with the patch I sent.
> 	  UNLESS io_ring_ctx_wait_and_kill() can be called by the exiting
> 	  task AFTER it passes exit_task_work(), but I don't see how this
> 	  is possible.
> 
> Lets forget this problem, lets assume that task_work_run() is always safe.
> 
> I still can not understand why io_ring_ctx_wait_and_kill() needs to call
> task_work_run().
> 
> On 04/07, Jens Axboe wrote:
>>
>> io_uring exit removes the pending poll requests, but what if (for non
>> exit invocation), we get poll requests completing before they are torn
>> down. Now we have task_work queued up that won't get run,
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> this must not be possible. If task_work is queued it will run, or we
> have another bug.
> 
>> because we
>> are are in the task_work handler for the __fput().
> 
> this doesn't matter...
> 
>> For this case, we
>> need to run the task work.
> 
> This is what I fail to understand :/

Actually debugging this just now to attempt to get to the bottom of it.
I'm running with Peter's "put fput work at the end at task_work_run
time" patch (with a head == NULL check that was missing). I get a hang
on the wait_for_completion() on io_uring exit, and if I dump the
task_work, this is what I get:

dump_work: dump cb
cb=ffff88bff25589b8, func=ffffffff812f7310	<- io_poll_task_func()
cb=ffff88bfdd164600, func=ffffffff812925e0	<- some __fput()
cb=ffff88bfece13cb8, func=ffffffff812f7310	<- io_poll_task_func()
cb=ffff88bff78393b8, func=ffffffff812b2c40

and we hang because io_poll_task_func() got queued twice on this task
_after_ we yanked the current list of work.

I'm adding some more debug items to figure out why this is, just wanted
to let you know that I'm currently looking into this and will provide
more data when I have it.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux