On 1/26/20 10:10 AM, Jens Axboe wrote: > On 1/26/20 3:12 AM, Andres Freund wrote: >> Hi, >> >> On 2019-10-25 11:30:35 -0600, Jens Axboe wrote: >>> This is in preparation for adding opcodes that need to add new files >>> in a process file table, system calls like open(2) or accept4(2). >>> >>> If an opcode needs this, it must set IO_WQ_WORK_NEEDS_FILES in the work >>> item. If work that needs to get punted to async context have this >>> set, the async worker will assume the original task file table before >>> executing the work. >>> >>> Note that opcodes that need access to the current files of an >>> application cannot be done through IORING_SETUP_SQPOLL. >> >> >> Unfortunately this partially breaks sharing a uring across with forked >> off processes, even though it initially appears to work: >> >> >>> +static int io_uring_flush(struct file *file, void *data) >>> +{ >>> + struct io_ring_ctx *ctx = file->private_data; >>> + >>> + io_uring_cancel_files(ctx, data); >>> + if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) >>> + io_wq_cancel_all(ctx->io_wq); >>> + return 0; >>> +} >> >> Once one process having the uring fd open (even if it were just a fork >> never touching the uring, I believe) exits, this prevents the uring from >> being usable for any async tasks. The process exiting closes the fd, >> which triggers flush. io_wq_cancel_all() sets IO_WQ_BIT_CANCEL, which >> never gets unset, which causes all future async sqes to be be >> immediately returned as -ECANCELLED by the worker, via io_req_cancelled. >> >> It's not clear to me why a close() should cancel the the wq (nor clear >> the entire backlog, after 1d7bb1d50fb4)? Couldn't that even just be a >> dup()ed fd? Or a fork that immediately exec()s? >> >> After rudely ifdefing out the above if, and reverting 44d282796f81, my >> WIP io_uring using version of postgres appears to pass its tests - which >> are very sparse at this point - again with 5.5-rc7. > > We need to cancel work items using the files from this process if it > exits, but I think we should be fine not canceling all work. Especially > since thet setting of IO_WQ_BIT_CANCEL is a one way street... I'm assuming > the below works for you? Could be even simpler, for shared ring setup, it also doesn't make any sense to flush the cq ring on exit. diff --git a/fs/io_uring.c b/fs/io_uring.c index e5b502091804..e54556b0fcc6 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5044,10 +5044,6 @@ static int io_uring_flush(struct file *file, void *data) struct io_ring_ctx *ctx = file->private_data; io_uring_cancel_files(ctx, data); - if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) { - io_cqring_overflow_flush(ctx, true); - io_wq_cancel_all(ctx->io_wq); - } return 0; } -- Jens Axboe