On 10/24/23 6:06 PM, Dave Chinner wrote: > On Tue, Oct 24, 2023 at 12:35:26PM -0600, Jens Axboe wrote: >> On 10/24/23 8:30 AM, Jens Axboe wrote: >>> I don't think this is related to the io-wq workers doing non-blocking >>> IO. > > The io-wq worker that has deadlocked _must_ be doing blocking IO. If > it was doing non-blocking IO (i.e. IOCB_NOWAIT) then it would have > done a trylock and returned -EAGAIN to the worker for it to try > again later. I'm not sure that would avoid the issue, however - it > seems to me like it might just turn it into a livelock rather than a > deadlock.... Sorry typo, yes they are doing blocking IO, that's all they ever do. My point is that it's not related to the issue. >>> The callback is eventually executed by the task that originally >>> submitted the IO, which is the owner and not the async workers. But... >>> If that original task is blocked in eg fallocate, then I can see how >>> that would potentially be an issue. >>> >>> I'll take a closer look. >> >> I think the best way to fix this is likely to have inode_dio_wait() be >> interruptible, and return -ERESTARTSYS if it should be restarted. Now >> the below is obviously not a full patch, but I suspect it'll make ext4 >> and xfs tick, because they should both be affected. > > How does that solve the problem? Nothing will issue a signal to the > process that is waiting in inode_dio_wait() except userspace, so I > can't see how this does anything to solve the problem at hand... Except task_work, which when it completes, will increment the i_dio count again. This is the whole point of the half assed patch I sent out. > I'm also very leary of adding new error handling complexity to paths > like truncate, extent cloning, fallocate, etc which expect to block > on locks until they can perform the operation safely. I actually looked at all of them, ext4 and xfs specifically. It really doesn't seem to bad. > On further thinking, this could be a self deadlock with > just async direct IO submission - submit an async DIO with > IOCB_CALLER_COMP, then run an unaligned async DIO that attempts to > drain in-flight DIO before continuing. Then the thread waits in > inode_dio_wait() because it can't run the completion that will drop > the i_dio_count to zero. No, because those will be non-blocking. Any blocking IO will go via io-wq, and that won't then hit the deadlock. If you're doing inode_dio_wait() from the task itself for a non-blocking issue, then that would surely be an issue. But we should not be doing that, and we are checking for it. > Hence it appears to me that we've missed some critical constraints > around nesting IO submission and completion when using > IOCB_CALLER_COMP. Further, it really isn't clear to me how deep the > scope of this problem is yet, let alone what the solution might be. I think you're missing exactly what the deadlock is. > With all this in mind, and how late this is in the 6.6 cycle, can we > just revert the IOCB_CALLER_COMP changes for now? Yeah I'm going to do a revert of the io_uring side, which effectively disables it. Then a revised series can be done, and when done, we could bring it back. -- Jens Axboe