Re: [PATCH 4/4] io_uring: flush task work before waiting for ring exit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/9/20 11:50 AM, Oleg Nesterov wrote:
> On 04/08, Jens Axboe wrote:
>>
>> So the question remains, we basically have this:
>>
>> A			B
>> task_work_run(tsk)
>> 			task_work_add(tsk, io_poll_task_func())
>> process cbs
>> wait_for_completion()
>>
>> with the last wait needing to flush the work added on the B side, since
>> that isn't part of the initial list.
> 
> I don't understand you, even remotely :/
> 
> maybe you can write some pseudo-code ?

Sorry, I guess my explanation skills aren't stellar, or perhaps they
assume too much prior knowledge about this.

I just wasted about a day on debugging this because I had Peter's patch
applied, and I should have reviewed that more carefully. So a lot of the
hangs were due to just missing the running of some of the work.

> who does wait_for_completion(), a callback? or this "tsk" after it does
> task_work_run() ? Who does complete() ? How can this wait_for_completion()
> help to flush the work added on the B side? And why do you need to do
> something special to flush that work?

I'll try to explain this. Each io_uring operation has a request
associated with it. Each request grabs a reference to the ctx, and the
ctx can't exit before we reach zero references (obviously). When we
enter ctx teardown, we wait for us to hit zero references. Each ctx is
basically just a file descriptor, which means that we most often end up
waiting for zero references off the fput() path.

Some requests can spawn task_works work items. If we have ordering
issues on the task_works list, we can have the fput() ordered before
items that need to complete. These items are what ultimately put the
request, so you end up in a situation where you block waiting for
requests to complete, but these requests can't complete until the the
current work has completed. This is the deadlock.

Once I got rid of Peter's patch, I solved this by just making the
wait-and-free operation go async. This allows any dependent work to run
and complete just fine. Here's the patch:

https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.7&id=2baf397719af3a8121fcaba964470950356a4a7a

and I could perhaps make this smarter by checking if current->task_works
!= NULL, but I don't think there's much point to that. The important
part is that we complete the fput() task_work without blocking, so the
remaining work gets a chance to run.

Hope this helps! As mentioned in the commit message, we could have a
separate list of task_works for io_uring, which is what I originally
proposed. But I can also work around it like in the patch referenced
above.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux