> On Aug 10, 2021, at 2:32 PM, Pavel Begunkov <asml.silence@xxxxxxxxx> wrote: > > On 8/10/21 9:28 AM, Nadav Amit wrote: >> >> Unfortunately, there seems to be yet another issue (unless my code >> somehow caused it). It seems that when SQPOLL is used, there are cases >> in which we get stuck in io_uring_cancel_sqpoll() when tctx_inflight() >> never goes down to zero. >> >> Debugging... (while also trying to make some progress with my code) > > It's most likely because a request has been lost (mis-refcounted). > Let us know if you need any help. Would be great to solve it for 5.14. > quick tips: > > 1) if not already, try out Jens' 5.14 branch > git://git.kernel.dk/linux-block io_uring-5.14 > > 2) try to characterise the io_uring use pattern. Poll requests? > Read/write requests? Send/recv? Filesystem vs bdev vs sockets? > > If easily reproducible, you can match io_alloc_req() with it > getting into io_dismantle_req(); So actually the problem is more of a missing IO-uring functionality that I need. When an I/O is queued for async completion (i.e., after returning -EIOCBQUEUED), there should be a way for io-uring to cancel these I/Os if needed. Otherwise they might potentially never complete, as happens in my use-case. AIO has ki_cancel() for this matter. So I presume the proper solution would be to move ki_cancel() from aio_kiocb to kiocb so it can be used by both io-uring and aio. And then - to use this infrastructure. But it is messy. There is already a bug in the (few) uses of kiocb_set_cancel_fn() that blindly assume AIO is used and not IO-uring. Then, I am not sure about some things in the AIO code. Oh boy. I’ll work on an RFC.