> > After reading the io_uring_enter(2) man page a IORING_OP_ASYNC_CANCEL's return value of -EALREADY apparently > > may not cause the request to terminate. At least that is our interpretation of "…res field will contain -EALREADY. > > In this case, the request may or may not terminate." > > I took a look at this, and my theory is that the request cancelation > ends up happening right in between when the work item is moved between > the work list and to the worker itself. The way the async queue works, > the work item is sitting in a list until it gets assigned by a worker. > When that assignment happens, it's removed from the general work list > and then assigned to the worker itself. There's a small gap there where > the work cannot be found in the general list, and isn't yet findable in > the worker itself either. > > Do you always see -ENOENT from the cancel when you get the hang > condition? No we also and actually more commonly observe cancel returning -EALREADY and the canceled read request never gets completed. As shown in the log snippet I included below. > > The far more common situation with the reproducer and adding 1 to the eventfds in each loop > > is that a request is not canceled and the cancel attempt returned with -EALREADY. > > There is no progress because the writer has already finished its loop and the cancel > > apparently does not really cancel the request. > > > > 1 Starting iteration 996 > > 1 Prepared read request (evfd: 1, tag: 1) > > 1 Submitted 1 requests -> 1 inflight > > 1 Prepared read request (evfd: 2, tag: 2) > > 1 Submitted 1 requests -> 2 inflight > > 1 Prepared write request (evfd: 0) > > 1 Submitted 1 requests -> 3 inflight > > 1 Collect write completion: 8 > > 1 Prepared cancel request for read 1 > > 1 Prepared cancel request for read 2 > > 1 Submitted 2 requests -> 4 inflight > > 1 Collect read 1 completion: -125 - Operation canceled > > 1 Collect cancel read 1 completion: 0 > > 1 Collect cancel read 2 completion: -114 - Operation already in progress ^- the cancel returned with -EALREADY but the cancelled read (the second prepared read request) is never completed. > > I'll play with this a bit and see if we can't close this hole so the > work is always reliably discoverable (and hence can get canceled). Thanks for your effort!