On 1/26/20 3:12 AM, Andres Freund wrote: > Hi, > > On 2019-10-25 11:30:35 -0600, Jens Axboe wrote: >> This is in preparation for adding opcodes that need to add new files >> in a process file table, system calls like open(2) or accept4(2). >> >> If an opcode needs this, it must set IO_WQ_WORK_NEEDS_FILES in the work >> item. If work that needs to get punted to async context have this >> set, the async worker will assume the original task file table before >> executing the work. >> >> Note that opcodes that need access to the current files of an >> application cannot be done through IORING_SETUP_SQPOLL. > > > Unfortunately this partially breaks sharing a uring across with forked > off processes, even though it initially appears to work: > > >> +static int io_uring_flush(struct file *file, void *data) >> +{ >> + struct io_ring_ctx *ctx = file->private_data; >> + >> + io_uring_cancel_files(ctx, data); >> + if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) >> + io_wq_cancel_all(ctx->io_wq); >> + return 0; >> +} > > Once one process having the uring fd open (even if it were just a fork > never touching the uring, I believe) exits, this prevents the uring from > being usable for any async tasks. The process exiting closes the fd, > which triggers flush. io_wq_cancel_all() sets IO_WQ_BIT_CANCEL, which > never gets unset, which causes all future async sqes to be be > immediately returned as -ECANCELLED by the worker, via io_req_cancelled. > > It's not clear to me why a close() should cancel the the wq (nor clear > the entire backlog, after 1d7bb1d50fb4)? Couldn't that even just be a > dup()ed fd? Or a fork that immediately exec()s? > > After rudely ifdefing out the above if, and reverting 44d282796f81, my > WIP io_uring using version of postgres appears to pass its tests - which > are very sparse at this point - again with 5.5-rc7. We need to cancel work items using the files from this process if it exits, but I think we should be fine not canceling all work. Especially since thet setting of IO_WQ_BIT_CANCEL is a one way street... I'm assuming the below works for you? diff --git a/fs/io_uring.c b/fs/io_uring.c index e5b502091804..e3ac2a6ff195 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -5044,10 +5044,8 @@ static int io_uring_flush(struct file *file, void *data) struct io_ring_ctx *ctx = file->private_data; io_uring_cancel_files(ctx, data); - if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) { + if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) io_cqring_overflow_flush(ctx, true); - io_wq_cancel_all(ctx->io_wq); - } return 0; } -- Jens Axboe