Re: [PATCH v2 13/13] io_uring: support buffer registration sharing

Bijan Mottahedeh <bijan.mottahedeh@xxxxxxxxxx> · Fri, 18 Dec 2020 10:06:24 -0800





@@ -8415,6 +8421,12 @@ static int io_sqe_buffers_unregister(struct io_ring_ctx *ctx)
  	if (!data)
  		return -ENXIO;
  
+	if (ctx->flags & IORING_SETUP_ATTACH_BUF) {
+		io_detach_buf_data(ctx);
+		ctx->nr_user_bufs = 0;

nr_user_bufs is a part of invariant and should stay together with
stuff in io_detach_buf_data().

Moved to io_detach_buf_data.


@@ -8724,9 +8740,17 @@ static int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
  	struct fixed_rsrc_ref_node *ref_node;
  	struct fixed_rsrc_data *buf_data;
  
+	if (ctx->flags & IORING_SETUP_ATTACH_BUF) {
+		if (!ctx->buf_data)
+			return -EFAULT;
+		ctx->nr_user_bufs = ctx->buf_data->ctx->nr_user_bufs;

Why? Once a table is initialised it shouldn't change its size, would
be racy otherwise.

ctx->buf_data is set at ring setup time but the sharing process 
(SETUP_SHARE) may do the actual buffer registration at an arbitrary time 
later, so the attaching process must ensure to get the updated value of 
nr_user_bufs if available.

  	buf_data = io_buffers_map_alloc(ctx, nr_args);
  	if (IS_ERR(buf_data))
  		return PTR_ERR(buf_data);
+	ctx->buf_data = buf_data;

Wanted to write that there is missing
`if (ctx->user_bufs) return -EBUSY`

but apparently it was moved into io_buffers_map_alloc().
I'd really prefer to have it here.

Moved it back.

+static int io_attach_buf_data(struct io_ring_ctx *ctx,
+			      struct io_uring_params *p)
+{
+	struct io_ring_ctx *ctx_attach;
+	struct fd f;
+
+	f = fdget(p->wq_fd);
+	if (!f.file)
+		return -EBADF;
+	if (f.file->f_op != &io_uring_fops) {
+		fdput(f);
+		return -EINVAL;
+	}
+
+	ctx_attach = f.file->private_data;
+	if (!ctx_attach->buf_data) {

It looks racy. What prevents it from being deleted while we're
working on it, e.g. by io_sqe_buffers_unregister?

I think the premise here is that buffer sharing happens between trusted 
and coordinated processes.  If I understand your concern correctly, then 
if the sharing process unregisters its buffers after having shared them, 
than that process is acting improperly.  The race could lead to a failed 
attach but that would be expected and reasonable I would think?  What do 
you think should happen in this case?


+		fdput(f);
+		return -EINVAL;
+	}
+	ctx->buf_data = ctx_attach->buf_data;

Before updates, etc. (e.g. __io_sqe_buffers_update()) were synchronised
by uring_lock, now it's modified concurrently, that looks to be really
racy.

Racy from the attaching process perspective you mean?


+
+	percpu_ref_get(&ctx->buf_data->refs);

Ok, now the original io_uring instance will wait until the attached
once get rid of their references. That's a versatile ground to have
in kernel deadlocks.

task1: uring1 = create()
task2: uring2 = create()
task1: uring3 = create(share=uring2);
task2: uring4 = create(share=uring1);

task1: io_sqe_buffers_unregister(uring1)
task2: io_sqe_buffers_unregister(uring2)

If I skimmed through the code right, that should hang unkillably.

So we need a way to enforce that a process can only have one role, 
sharing or attaching? But I'm not what the best way to do that.  Is this 
an issue for other resource sharing, work queues or polling thread?