Hi, Currently, when a ring is being shut down, some cancelations may happen out-of-line. This means that an application cannot rely on the ring exit meaning that any IO has fully completed, or someone else waiting on an application (which has a ring with pending IO) being terminated will mean that all requests are done. This has also manifested itself as various testing sometimes finding a mount point busy after a test has exited, because it may take a brief period of time for things to quiesce and be fully done. This patchset makes the task wait on the cancelations, if any, when the io_uring file fd is being put. That has the effect of ensuring that pending IO has fully completed, and files closed, before the ring exit returns. I did post a previous version of this - fundamentally this one is the same, with the main difference being that rather than invent our own type of references for the ring, a basic atomic_long_t is used. io_uring batches the reference gets and puts on the ring, so this should not be noticeable. The only potential outlier is setting up a ring without DEFER_TASKRUN, where running task_work will result in an atomic dec and inc per ring in running the task_work. We can probably do something about that, but I don't consider it pressing. The switch away from percpu reference counts is done mostly because exiting those references will cost us an RCU grace period. That will noticeably slow down the ring tear down. The changes can also be found here: https://git.kernel.dk/cgit/linux/log/?h=io_uring-exit-cancel.2 fs/file_table.c | 2 +- include/linux/io_uring_types.h | 4 +- include/linux/sched.h | 2 +- io_uring/io_uring.c | 79 +++++++++++++++++++++++----------- io_uring/io_uring.h | 3 +- io_uring/msg_ring.c | 4 +- io_uring/refs.h | 43 ++++++++++++++++++ io_uring/register.c | 2 +- io_uring/rw.c | 2 +- io_uring/sqpoll.c | 2 +- io_uring/zcrx.c | 4 +- kernel/fork.c | 2 +- 12 files changed, 111 insertions(+), 38 deletions(-) -- Jens Axboe