Re: [PATCH 2/2] io_uring: add support for passing fixed file descriptors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/18/22 19:34, Jens Axboe wrote:
On 6/18/22 5:02 AM, Hao Xu wrote:
On 6/17/22 21:45, Jens Axboe wrote:
With IORING_OP_MSG_RING, one ring can send a message to another ring.
Extend that support to also allow sending a fixed file descriptor to
that ring, enabling one ring to pass a registered descriptor to another
one.

Arguments are extended to pass in:

sqe->addr3    fixed file slot in source ring
sqe->file_index    fixed file slot in destination ring

IORING_OP_MSG_RING is extended to take a command argument in sqe->addr.
If set to zero (or IORING_MSG_DATA), it sends just a message like before.
If set to IORING_MSG_SEND_FD, a fixed file descriptor is sent according
to the above arguments.

Undecided:
     - Should we post a cqe with the send, or require that the sender
       just link a separate IORING_OP_MSG_RING? This makes error
       handling easier, as we cannot easily retract the installed
       file descriptor if the target CQ ring is full. Right now we do
       fill a CQE. If the request completes with -EOVERFLOW, then the
       sender must re-send a CQE if the target must get notified.

Hi Jens,
Since we are have open/accept direct feature, this may be useful. But I
just can't think of a real case that people use two rings and need to do
operations to same fd.

The two cases that people bring up as missing for direct descriptors
that you can currently do with a real fd is:

1) Server needs to be shutdown or restarted, pass file descriptors to
    another onei

2) Backend is split, and one accepts connections, while others then get
    the fd passed and handle the actual connection.

Both of those are classic SCM_RIGHTS use cases, and it's not possible to
support them with direct descriptors today.

I see, thanks for detail explanation.


Assume there are real cases, then filling a cqe is necessary since users
need to first make sure the desired fd is registered before doing
something to it.

Right, my quesion here was really whether it should be bundled with the
IORING_MSG_SEND_FD operation, or whether the issuer of that should also
be responsible for then posting a "normal" IORING_OP_MSG_SEND to the
target ring to notify it if the fact that an fd has been sent to it.

If the operation is split like the latter, then it makes the error
handling a bit easier as we eliminate one failing part of the existing
MSG_SEND_FD.

You could then also pass a number of descriptors and then post a single
OP_MSG_SEND with some data that tells you which descriptors were passed.

For the basic use case of just passing a single descriptor, what the
code currently does is probably the sanest approach - send the fd, post
a cqe.

A downside is users have to take care to do fd delivery especially
when slot resource is in short supply in target_ctx.

                 ctx                            target_ctx
     msg1(fd1 to target slot x)

     msg2(fd2 to target slot x)

                                              get cqe of msg1
                                   do something to fd1 by access slot x


the msg2 is issued not at the right time. In short not only ctx needs to
fill a cqe to target_ctx to inform that the file has been registered
but also the target_ctx has to tell ctx that "my slot x is free now
for you to deliver fd". So I guess users are inclined to allocate a
big fixed table and deliver fds to target_ctx in different slots,
Which is ok but anyway a limitation.

I suspect the common use case would be to use the alloc feature, since
the sender generally has no way of knowing which slots are free on the
target ring.

I mean the sender may not easily know which value to set for
msg->dst_fd not about the alloc feature.


+static int io_double_lock_ctx(struct io_ring_ctx *ctx,
+                  struct io_ring_ctx *octx,
+                  unsigned int issue_flags)
+{
+    /*
+     * To ensure proper ordering between the two ctxs, we can only
+     * attempt a trylock on the target. If that fails and we already have
+     * the source ctx lock, punt to io-wq.
+     */
+    if (!(issue_flags & IO_URING_F_UNLOCKED)) {
+        if (!mutex_trylock(&octx->uring_lock))
+            return -EAGAIN;
+        return 0;
+    }
+
+    /* Always grab smallest value ctx first. */
+    if (ctx < octx) {
+        mutex_lock(&ctx->uring_lock);
+        mutex_lock(&octx->uring_lock);
+    } else if (ctx > octx) {


Would a simple else work?
if (a < b) {
   lock(a); lock(b);
} else {
   lock(b);lock(a);
}

since a doesn't equal b

Yes that'd be fine, I think I added the a == b pre-check a bit later in
the process. I'll change this to an else instead.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux