Re: [PATCH 2/2] io_uring: add support for passing fixed file descriptors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/17/22 21:45, Jens Axboe wrote:
With IORING_OP_MSG_RING, one ring can send a message to another ring.
Extend that support to also allow sending a fixed file descriptor to
that ring, enabling one ring to pass a registered descriptor to another
one.

Arguments are extended to pass in:

sqe->addr3	fixed file slot in source ring
sqe->file_index	fixed file slot in destination ring

IORING_OP_MSG_RING is extended to take a command argument in sqe->addr.
If set to zero (or IORING_MSG_DATA), it sends just a message like before.
If set to IORING_MSG_SEND_FD, a fixed file descriptor is sent according
to the above arguments.

Undecided:
	- Should we post a cqe with the send, or require that the sender
	  just link a separate IORING_OP_MSG_RING? This makes error
	  handling easier, as we cannot easily retract the installed
	  file descriptor if the target CQ ring is full. Right now we do
	  fill a CQE. If the request completes with -EOVERFLOW, then the
	  sender must re-send a CQE if the target must get notified.

Hi Jens,
Since we are have open/accept direct feature, this may be useful. But I
just can't think of a real case that people use two rings and need to do
operations to same fd.
Assume there are real cases, then filling a cqe is necessary since users
need to first make sure the desired fd is registered before doing
something to it.

A downside is users have to take care to do fd delivery especially
when slot resource is in short supply in target_ctx.

                ctx                            target_ctx
    msg1(fd1 to target slot x)

    msg2(fd2 to target slot x)

                                             get cqe of msg1
                                  do something to fd1 by access slot x


the msg2 is issued not at the right time. In short not only ctx needs to
fill a cqe to target_ctx to inform that the file has been registered
but also the target_ctx has to tell ctx that "my slot x is free now
for you to deliver fd". So I guess users are inclined to allocate a
big fixed table and deliver fds to target_ctx in different slots,
Which is ok but anyway a limitation.


	- Add an IORING_MSG_MOVE_FD which moves the descriptor, removing
	  it from the source ring when installed in the target? Again
	  error handling is difficult.

Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
---
  include/uapi/linux/io_uring.h |   8 +++
  io_uring/msg_ring.c           | 122 ++++++++++++++++++++++++++++++++--
  2 files changed, 123 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 8715f0942ec2..dbdaeef3ea89 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -264,6 +264,14 @@ enum io_uring_op {
   */
  #define IORING_ACCEPT_MULTISHOT	(1U << 0)
+/*
+ * IORING_OP_MSG_RING command types, stored in sqe->addr
+ */
+enum {
+	IORING_MSG_DATA,	/* pass sqe->len as 'res' and off as user_data */
+	IORING_MSG_SEND_FD,	/* send a registered fd to another ring */
+};
+
  /*
   * IO completion data structure (Completion Queue Entry)
   */
diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index b02be2349652..e9d6fb25d141 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -3,46 +3,154 @@
  #include <linux/errno.h>
  #include <linux/file.h>
  #include <linux/slab.h>
+#include <linux/nospec.h>
  #include <linux/io_uring.h>
#include <uapi/linux/io_uring.h> #include "io_uring.h"
+#include "rsrc.h"
+#include "filetable.h"
  #include "msg_ring.h"
struct io_msg {
  	struct file			*file;
  	u64 user_data;
  	u32 len;
+	u32 cmd;
+	u32 src_fd;
+	u32 dst_fd;
  };
+static int io_msg_ring_data(struct io_kiocb *req)
+{
+	struct io_ring_ctx *target_ctx = req->file->private_data;
+	struct io_msg *msg = io_kiocb_to_cmd(req);
+
+	if (msg->src_fd || msg->dst_fd)
+		return -EINVAL;
+
+	if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, 0))
+		return 0;
+
+	return -EOVERFLOW;
+}
+
+static void io_double_unlock_ctx(struct io_ring_ctx *ctx,
+				 struct io_ring_ctx *octx,
+				 unsigned int issue_flags)
+{
+	if (issue_flags & IO_URING_F_UNLOCKED)
+		mutex_unlock(&ctx->uring_lock);
+	mutex_unlock(&octx->uring_lock);
+}
+
+static int io_double_lock_ctx(struct io_ring_ctx *ctx,
+			      struct io_ring_ctx *octx,
+			      unsigned int issue_flags)
+{
+	/*
+	 * To ensure proper ordering between the two ctxs, we can only
+	 * attempt a trylock on the target. If that fails and we already have
+	 * the source ctx lock, punt to io-wq.
+	 */
+	if (!(issue_flags & IO_URING_F_UNLOCKED)) {
+		if (!mutex_trylock(&octx->uring_lock))
+			return -EAGAIN;
+		return 0;
+	}
+
+	/* Always grab smallest value ctx first. */
+	if (ctx < octx) {
+		mutex_lock(&ctx->uring_lock);
+		mutex_lock(&octx->uring_lock);
+	} else if (ctx > octx) {


Would a simple else work?
if (a < b) {
  lock(a); lock(b);
} else {
  lock(b);lock(a);
}

since a doesn't equal b






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux