Re: [PATCH v2 13/13] io_uring: support buffer registration sharing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/18/2020 10:06 AM, Bijan Mottahedeh wrote:

@@ -8415,6 +8421,12 @@ static int io_sqe_buffers_unregister(struct io_ring_ctx *ctx)
      if (!data)
          return -ENXIO;
+    if (ctx->flags & IORING_SETUP_ATTACH_BUF) {
+        io_detach_buf_data(ctx);
+        ctx->nr_user_bufs = 0;

nr_user_bufs is a part of invariant and should stay together with
stuff in io_detach_buf_data().

Moved to io_detach_buf_data.


@@ -8724,9 +8740,17 @@ static int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
      struct fixed_rsrc_ref_node *ref_node;
      struct fixed_rsrc_data *buf_data;
+    if (ctx->flags & IORING_SETUP_ATTACH_BUF) {
+        if (!ctx->buf_data)
+            return -EFAULT;
+        ctx->nr_user_bufs = ctx->buf_data->ctx->nr_user_bufs;

Why? Once a table is initialised it shouldn't change its size, would
be racy otherwise.

ctx->buf_data is set at ring setup time but the sharing process (SETUP_SHARE) may do the actual buffer registration at an arbitrary time later, so the attaching process must ensure to get the updated value of nr_user_bufs if available.

      buf_data = io_buffers_map_alloc(ctx, nr_args);
      if (IS_ERR(buf_data))
          return PTR_ERR(buf_data);
+    ctx->buf_data = buf_data;

Wanted to write that there is missing
`if (ctx->user_bufs) return -EBUSY`

but apparently it was moved into io_buffers_map_alloc().
I'd really prefer to have it here.

Moved it back.

+static int io_attach_buf_data(struct io_ring_ctx *ctx,
+                  struct io_uring_params *p)
+{
+    struct io_ring_ctx *ctx_attach;
+    struct fd f;
+
+    f = fdget(p->wq_fd);
+    if (!f.file)
+        return -EBADF;
+    if (f.file->f_op != &io_uring_fops) {
+        fdput(f);
+        return -EINVAL;
+    }
+
+    ctx_attach = f.file->private_data;
+    if (!ctx_attach->buf_data) {

It looks racy. What prevents it from being deleted while we're
working on it, e.g. by io_sqe_buffers_unregister?

I think the premise here is that buffer sharing happens between trusted and coordinated processes.  If I understand your concern correctly, then if the sharing process unregisters its buffers after having shared them, than that process is acting improperly.  The race could lead to a failed attach but that would be expected and reasonable I would think?  What do you think should happen in this case?


+        fdput(f);
+        return -EINVAL;
+    }
+    ctx->buf_data = ctx_attach->buf_data;

Before updates, etc. (e.g. __io_sqe_buffers_update()) were synchronised
by uring_lock, now it's modified concurrently, that looks to be really
racy.

Racy from the attaching process perspective you mean?


+
+    percpu_ref_get(&ctx->buf_data->refs);

Ok, now the original io_uring instance will wait until the attached
once get rid of their references. That's a versatile ground to have
in kernel deadlocks.

task1: uring1 = create()
task2: uring2 = create()
task1: uring3 = create(share=uring2);
task2: uring4 = create(share=uring1);

task1: io_sqe_buffers_unregister(uring1)
task2: io_sqe_buffers_unregister(uring2)

If I skimmed through the code right, that should hang unkillably.

So we need a way to enforce that a process can only have one role, sharing or attaching? But I'm not what the best way to do that.  Is this an issue for other resource sharing, work queues or polling thread?


The intended use case for buffer registration is:

- a group of processes attach a shmem segment
- one process registers the buffers in the shmem segment and shares it
- other processes attach that registration

For this case, it seems that there is really no need to wait for the attached processes to get rid of the their references since the shmem segment (and thus the registered buffers) will persist anyway until the last attached process goes away. So the last unregister could quiesce all references and get rid of the shared buf_data.

I'm not sure how useful the non-shmem use case would be anyway.

Would it makes sense to restrict the scope of this feature?





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux