On Fri, Feb 07, 2025 at 07:06:54AM -0700, Keith Busch wrote: > On Fri, Feb 07, 2025 at 11:51:49AM +0800, Ming Lei wrote: > > On Mon, Feb 03, 2025 at 07:45:11AM -0800, Keith Busch wrote: > > > > > > The previous version from Ming can be viewed here: > > > > > > https://lore.kernel.org/linux-block/20241107110149.890530-1-ming.lei@xxxxxxxxxx/ > > > > > > Based on the feedback from that thread, the desired io_uring interfaces > > > needed to be simpler, and the kernel registered resources need to behave > > > more similiar to user registered buffers. > > > > > > This series introduces a new resource node type, KBUF, which, like the > > > BUFFER resource, needs to be installed into an io_uring buf_node table > > > in order for the user to access it in a fixed buffer command. The > > > new io_uring kernel API provides a way for a user to register a struct > > > request's bvec to a specific index, and a way to unregister it. > > > > > > When the ublk server receives notification of a new command, it must > > > first select an index and register the zero copy buffer. It may use that > > > index for any number of fixed buffer commands, then it must unregister > > > the index when it's done. This can all be done in a single io_uring_enter > > > if desired, or it can be split into multiple enters if needed. > > > > I suspect it may not be done in single io_uring_enter() because there > > is strict dependency among the three OPs(register buffer, read/write, > > unregister buffer). > > The registration is synchronous. io_uring completes the SQE entirely > before it even looks at the read command in the next SQE. Can you explain a bit "synchronous" here? In patch 4, two ublk uring_cmd(UBLK_U_IO_REGISTER_IO_BUF/UBLK_U_IO_UNREGISTER_IO_BUF) are added, and their handlers are called from uring_cmd's ->issue(). > > The read or write is asynchronous, but it's prep takes a reference on > the node before moving on to the next SQE.. The buffer is registered in ->issue() of UBLK_U_IO_REGISTER_IO_BUF, and it isn't done yet when calling ->prep() of read_fixed/write_fixed, in which buffer is looked up in ->prep(). > > The unregister is synchronous, and clears the index node, but the > possibly inflight read or write has a reference on that node, so all > good. UBLK_U_IO_UNREGISTER_IO_BUF tells ublk that the buffer isn't used any more, but it is being used by the async read/write. It might work, but looks a bit fragile, such as: One buggy application may panic kernel if the IO command is completed before read/write is done. > > > > + ublk_get_sqe_three(q->ring_ptr, ®, &read, &ureg); > > > + > > > + io_uring_prep_buf_register(reg, 0, tag, q->q_id, tag); > > > + > > > + io_uring_prep_read_fixed(read, 1 /*fds[1]*/, > > > + 0, > > > + iod->nr_sectors << 9, > > > + iod->start_sector << 9, > > > + tag); > > > + io_uring_sqe_set_flags(read, IOSQE_FIXED_FILE); > > > + read->user_data = build_user_data(tag, ublk_op, 0, 1); > > > > Does this interface support to read to partial buffer? Which is useful > > for stacking device cases. > > Are you wanting to read into this buffer without copying in parts? As in > provide an offset and/or smaller length across multiple commands? If > that's what you mean, then yes, you can do that here. OK. > > > Also does this interface support to consume the buffer from multiple > > OPs concurrently? > > You can register as many kernel buffers from as many OPs as you have > space for in your table, and you can use them all concurrently. Pretty > much the same as user registered fixed buffers. The main difference from > user buffers is how you register them. Here it depends on if LINK between buffer register and read/write are required. If it is required, multiple OPs consuming the buffer have to be linked one by one, then they can't be issue concurrently. Thanks, Ming