Re: [PATCH 2/4] io_uring: io_uring: add support for async work inheriting files

Jens Axboe <axboe@xxxxxxxxx> · Sun, 26 Jan 2020 10:17:00 -0700

On 1/26/20 10:10 AM, Jens Axboe wrote:
> On 1/26/20 3:12 AM, Andres Freund wrote:
>> Hi,
>>
>> On 2019-10-25 11:30:35 -0600, Jens Axboe wrote:
>>> This is in preparation for adding opcodes that need to add new files
>>> in a process file table, system calls like open(2) or accept4(2).
>>>
>>> If an opcode needs this, it must set IO_WQ_WORK_NEEDS_FILES in the work
>>> item. If work that needs to get punted to async context have this
>>> set, the async worker will assume the original task file table before
>>> executing the work.
>>>
>>> Note that opcodes that need access to the current files of an
>>> application cannot be done through IORING_SETUP_SQPOLL.
>>
>>
>> Unfortunately this partially breaks sharing a uring across with forked
>> off processes, even though it initially appears to work:
>>
>>
>>> +static int io_uring_flush(struct file *file, void *data)
>>> +{
>>> +	struct io_ring_ctx *ctx = file->private_data;
>>> +
>>> +	io_uring_cancel_files(ctx, data);
>>> +	if (fatal_signal_pending(current) || (current->flags & PF_EXITING))
>>> +		io_wq_cancel_all(ctx->io_wq);
>>> +	return 0;
>>> +}
>>
>> Once one process having the uring fd open (even if it were just a fork
>> never touching the uring, I believe) exits, this prevents the uring from
>> being usable for any async tasks. The process exiting closes the fd,
>> which triggers flush. io_wq_cancel_all() sets IO_WQ_BIT_CANCEL, which
>> never gets unset, which causes all future async sqes to be be
>> immediately returned as -ECANCELLED by the worker, via io_req_cancelled.
>>
>> It's not clear to me why a close() should cancel the the wq (nor clear
>> the entire backlog, after 1d7bb1d50fb4)? Couldn't that even just be a
>> dup()ed fd? Or a fork that immediately exec()s?
>>
>> After rudely ifdefing out the above if, and reverting 44d282796f81, my
>> WIP io_uring using version of postgres appears to pass its tests - which
>> are very sparse at this point - again with 5.5-rc7.
> 
> We need to cancel work items using the files from this process if it
> exits, but I think we should be fine not canceling all work. Especially
> since thet setting of IO_WQ_BIT_CANCEL is a one way street...  I'm assuming
> the below works for you?

Could be even simpler, for shared ring setup, it also doesn't make any sense
to flush the cq ring on exit.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index e5b502091804..e54556b0fcc6 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5044,10 +5044,6 @@ static int io_uring_flush(struct file *file, void *data)
 	struct io_ring_ctx *ctx = file->private_data;
 
 	io_uring_cancel_files(ctx, data);
-	if (fatal_signal_pending(current) || (current->flags & PF_EXITING)) {
-		io_cqring_overflow_flush(ctx, true);
-		io_wq_cancel_all(ctx->io_wq);
-	}
 	return 0;
 }
 

-- 
Jens Axboe