Re: [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/29/24 1:18 PM, Jens Axboe wrote:
> Now, this implementation requires a user buffer, and as far as I'm told,
> you currently have kernel buffers on the ublk side. There's absolutely
> no reason why kernel buffers cannot work, we'd most likely just need to
> add a IORING_RSRC_KBUFFER type to handle that. My question here is how
> hard is this requirement? Reason I ask is that it's much simpler to work
> with userspace buffers. Yes the current implementation maps them
> everytime, we could certainly change that, however I don't see this
> being an issue. It's really no different than O_DIRECT, and you only
> need to map them once for a read + whatever number of writes you'd need
> to do. If a 'tag' is provided for LOCAL_BUF, it'll post a CQE whenever
> that buffer is unmapped. This is a notification for the application that
> it's done using the buffer. For a pure kernel buffer, we'd either need
> to be able to reference it (so that we KNOW it's not going away) and/or
> have a callback associated with the buffer.

Just to expand on this - if a kernel buffer is absolutely required, for
example if you're inheriting pages from the page cache or other
locations you cannot control, we would need to add something ala the
below:

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 9621ba533b35..b0258eb37681 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -474,6 +474,10 @@ void io_free_rsrc_node(struct io_rsrc_node *node)
 		if (node->buf)
 			io_buffer_unmap(node->ctx, node);
 		break;
+	case IORING_RSRC_KBUFFER:
+		if (node->buf)
+			node->kbuf_fn(node->buf);
+		break;
 	default:
 		WARN_ON_ONCE(1);
 		break;
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index be9b490c400e..8d00460d47ff 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -11,6 +11,7 @@
 enum {
 	IORING_RSRC_FILE		= 0,
 	IORING_RSRC_BUFFER		= 1,
+	IORING_RSRC_KBUFFER		= 2,
 };
 
 struct io_rsrc_node {
@@ -19,6 +20,7 @@ struct io_rsrc_node {
 	u16				type;
 
 	u64 tag;
+	void (*kbuf_fn)(struct io_mapped_ubuf *);
 	union {
 		unsigned long file_ptr;
 		struct io_mapped_ubuf *buf;

and provide a helper that allocates an io_rsrc_node, sets it to type
IORING_RSRC_KBUFFER, and assigns a ->kbuf_fn() that gets a callback when
the final put of the node happens. Whatever ublk command that wants to
do zero copy would call this helper at prep time and set the
io_submit_state buffer to be used.

Likewise, probably provide an io_rsrc helper that can be called by
kbuf_fn as well to do final cleanup, so that the callback itself is only
tasked with whatever it needs to do once it's received the data.

For this to work, we'll absolutely need the provider to guarantee that
the pages mapped will remain persistent until that callback is received.
Or have a way to reference the data inside rsrc.c. I'm imagining this is
just stacking the IO, so eg you get a read with some data already in
there that you don't control, and you don't complete this read until
some other IO is done. That other IO is what is using the buffer
provided here.

Anyway, just a suggestion if the user provided memory is a no-go, there
are certainly ways we can make this trivially work with memory you
cannot control that is received from inside the kernel, without a lot of
additions.

-- 
Jens Axboe




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux