Re: [PATCH 0/6] ublk zero-copy support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 07, 2025 at 07:06:54AM -0700, Keith Busch wrote:
> On Fri, Feb 07, 2025 at 11:51:49AM +0800, Ming Lei wrote:
> > On Mon, Feb 03, 2025 at 07:45:11AM -0800, Keith Busch wrote:
> > > 
> > > The previous version from Ming can be viewed here:
> > > 
> > >   https://lore.kernel.org/linux-block/20241107110149.890530-1-ming.lei@xxxxxxxxxx/
> > > 
> > > Based on the feedback from that thread, the desired io_uring interfaces
> > > needed to be simpler, and the kernel registered resources need to behave
> > > more similiar to user registered buffers.
> > > 
> > > This series introduces a new resource node type, KBUF, which, like the
> > > BUFFER resource, needs to be installed into an io_uring buf_node table
> > > in order for the user to access it in a fixed buffer command. The
> > > new io_uring kernel API provides a way for a user to register a struct
> > > request's bvec to a specific index, and a way to unregister it.
> > > 
> > > When the ublk server receives notification of a new command, it must
> > > first select an index and register the zero copy buffer. It may use that
> > > index for any number of fixed buffer commands, then it must unregister
> > > the index when it's done. This can all be done in a single io_uring_enter
> > > if desired, or it can be split into multiple enters if needed.
> > 
> > I suspect it may not be done in single io_uring_enter() because there
> > is strict dependency among the three OPs(register buffer, read/write,
> > unregister buffer).
> 
> The registration is synchronous. io_uring completes the SQE entirely
> before it even looks at the read command in the next SQE.

Can you explain a bit "synchronous" here?

In patch 4, two ublk uring_cmd(UBLK_U_IO_REGISTER_IO_BUF/UBLK_U_IO_UNREGISTER_IO_BUF)
are added, and their handlers are called from uring_cmd's ->issue().

> 
> The read or write is asynchronous, but it's prep takes a reference on
> the node before moving on to the next SQE..

The buffer is registered in ->issue() of UBLK_U_IO_REGISTER_IO_BUF,
and it isn't done yet when calling ->prep() of read_fixed/write_fixed,
in which buffer is looked up in ->prep().

> 
> The unregister is synchronous, and clears the index node, but the
> possibly inflight read or write has a reference on that node, so all
> good.

UBLK_U_IO_UNREGISTER_IO_BUF tells ublk that the buffer isn't used any
more, but it is being used by the async read/write.

It might work, but looks a bit fragile, such as:

One buggy application may panic kernel if the IO command is completed
before read/write is done.

> 
> > > +		ublk_get_sqe_three(q->ring_ptr, &reg, &read, &ureg);
> > > +
> > > +		io_uring_prep_buf_register(reg, 0, tag, q->q_id, tag);
> > > +
> > > +		io_uring_prep_read_fixed(read, 1 /*fds[1]*/,
> > > +			0,
> > > +			iod->nr_sectors << 9,
> > > +			iod->start_sector << 9,
> > > +			tag);
> > > +		io_uring_sqe_set_flags(read, IOSQE_FIXED_FILE);
> > > +		read->user_data = build_user_data(tag, ublk_op, 0, 1);
> > 
> > Does this interface support to read to partial buffer? Which is useful
> > for stacking device cases.
> 
> Are you wanting to read into this buffer without copying in parts? As in
> provide an offset and/or smaller length across multiple commands? If
> that's what you mean, then yes, you can do that here.

OK.

>  
> > Also does this interface support to consume the buffer from multiple
> > OPs concurrently? 
> 
> You can register as many kernel buffers from as many OPs as you have
> space for in your table, and you can use them all concurrently. Pretty
> much the same as user registered fixed buffers. The main difference from
> user buffers is how you register them.

Here it depends on if LINK between buffer register and read/write are
required. If it is required, multiple OPs consuming the buffer have to
be linked one by one, then they can't be issue concurrently.


Thanks,
Ming





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux