Re: [PATCH 1/2] io_uring: clear TIF_NOTIFY_SIGNAL when running task work

Nadav Amit <nadav.amit@xxxxxxxxx> · Tue, 10 Aug 2021 19:33:28 -0700

> On Aug 10, 2021, at 2:32 PM, Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
> 
> On 8/10/21 9:28 AM, Nadav Amit wrote:
>> 
>> Unfortunately, there seems to be yet another issue (unless my code
>> somehow caused it). It seems that when SQPOLL is used, there are cases
>> in which we get stuck in io_uring_cancel_sqpoll() when tctx_inflight()
>> never goes down to zero.
>> 
>> Debugging... (while also trying to make some progress with my code)
> 
> It's most likely because a request has been lost (mis-refcounted).
> Let us know if you need any help. Would be great to solve it for 5.14.
> quick tips: 
> 
> 1) if not already, try out Jens' 5.14 branch
> git://git.kernel.dk/linux-block io_uring-5.14
> 
> 2) try to characterise the io_uring use pattern. Poll requests?
> Read/write requests? Send/recv? Filesystem vs bdev vs sockets?
> 
> If easily reproducible, you can match io_alloc_req() with it
> getting into io_dismantle_req();

So actually the problem is more of a missing IO-uring functionality that I need. When an I/O is queued for async completion (i.e., after returning -EIOCBQUEUED), there should be a way for io-uring to cancel these I/Os if needed. Otherwise they might potentially never complete, as happens in my use-case.

AIO has ki_cancel() for this matter. So I presume the proper solution would be to move ki_cancel() from aio_kiocb to kiocb so it can be used by both io-uring and aio. And then - to use this infrastructure.

But it is messy. There is already a bug in the (few) uses of kiocb_set_cancel_fn() that blindly assume AIO is used and not IO-uring. Then, I am not sure about some things in the AIO code. Oh boy. I’ll work on an RFC.