On 10/29/24 1:18 PM, Jens Axboe wrote: > Now, this implementation requires a user buffer, and as far as I'm told, > you currently have kernel buffers on the ublk side. There's absolutely > no reason why kernel buffers cannot work, we'd most likely just need to > add a IORING_RSRC_KBUFFER type to handle that. My question here is how > hard is this requirement? Reason I ask is that it's much simpler to work > with userspace buffers. Yes the current implementation maps them > everytime, we could certainly change that, however I don't see this > being an issue. It's really no different than O_DIRECT, and you only > need to map them once for a read + whatever number of writes you'd need > to do. If a 'tag' is provided for LOCAL_BUF, it'll post a CQE whenever > that buffer is unmapped. This is a notification for the application that > it's done using the buffer. For a pure kernel buffer, we'd either need > to be able to reference it (so that we KNOW it's not going away) and/or > have a callback associated with the buffer. Just to expand on this - if a kernel buffer is absolutely required, for example if you're inheriting pages from the page cache or other locations you cannot control, we would need to add something ala the below: diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 9621ba533b35..b0258eb37681 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -474,6 +474,10 @@ void io_free_rsrc_node(struct io_rsrc_node *node) if (node->buf) io_buffer_unmap(node->ctx, node); break; + case IORING_RSRC_KBUFFER: + if (node->buf) + node->kbuf_fn(node->buf); + break; default: WARN_ON_ONCE(1); break; diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index be9b490c400e..8d00460d47ff 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -11,6 +11,7 @@ enum { IORING_RSRC_FILE = 0, IORING_RSRC_BUFFER = 1, + IORING_RSRC_KBUFFER = 2, }; struct io_rsrc_node { @@ -19,6 +20,7 @@ struct io_rsrc_node { u16 type; u64 tag; + void (*kbuf_fn)(struct io_mapped_ubuf *); union { unsigned long file_ptr; struct io_mapped_ubuf *buf; and provide a helper that allocates an io_rsrc_node, sets it to type IORING_RSRC_KBUFFER, and assigns a ->kbuf_fn() that gets a callback when the final put of the node happens. Whatever ublk command that wants to do zero copy would call this helper at prep time and set the io_submit_state buffer to be used. Likewise, probably provide an io_rsrc helper that can be called by kbuf_fn as well to do final cleanup, so that the callback itself is only tasked with whatever it needs to do once it's received the data. For this to work, we'll absolutely need the provider to guarantee that the pages mapped will remain persistent until that callback is received. Or have a way to reference the data inside rsrc.c. I'm imagining this is just stacking the IO, so eg you get a read with some data already in there that you don't control, and you don't complete this read until some other IO is done. That other IO is what is using the buffer provided here. Anyway, just a suggestion if the user provided memory is a no-go, there are certainly ways we can make this trivially work with memory you cannot control that is received from inside the kernel, without a lot of additions. -- Jens Axboe