Re: [PATCH 3/4] io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/11/23 9:12?AM, Ming Lei wrote:
> On Sat, Feb 11, 2023 at 08:45:18AM -0700, Jens Axboe wrote:
>> On 2/10/23 8:32?AM, Ming Lei wrote:
>>> IORING_OP_READ_SPLICE_BUF: read to buffer which is built from
>>> ->read_splice() of specified fd, so user needs to provide (splice_fd, offset, len)
>>> for building buffer.
>>>
>>> IORING_OP_WRITE_SPLICE_BUF: write from buffer which is built from
>>> ->read_splice() of specified fd, so user needs to provide (splice_fd, offset, len)
>>> for building buffer.
>>>
>>> The typical use case is for supporting ublk/fuse io_uring zero copy,
>>> and READ/WRITE OP retrieves ublk/fuse request buffer via direct pipe
>>> from device->read_splice(), then READ/WRITE can be done to/from this
>>> buffer directly.
>>
>> Main question here - would this be better not plumbed up through the rw
>> path? Might be cleaner, even if it either requires a bit of helper
>> refactoring or accepting a bit of duplication. But would still be better
>> than polluting the rw fast path imho.
> 
> The buffer is actually IO buffer, which has to be plumbed up in IO path,
> and it can't be done like the registered buffer.
> 
> The only affect on fast path is :
> 
> 		if (io_rw_splice_buf(req))	//which just check opcode
>               return io_prep_rw_splice_buf(req, sqe);
> 
> and the cleanup code which is only done for the two new OPs.
> 
> Or maybe I misunderstand your point? Or any detailed suggestion?
> 
> Actually the code should be factored into generic helper, since net.c
> need to use them too. Probably it needs to move to rsrc.c?

Yep, just refactoring out those bits as a prep thing. rsrc could work,
or perhaps a new file for that.

>> Also seems like this should be separately testable. We can't add new
>> opcodes that don't have a feature test at least, and should also have
>> various corner case tests. A bit of commenting outside of this below.
> 
> OK, I will write/add one very simple ublk userspace to liburing for
> test purpose.

Thanks!

>>> diff --git a/io_uring/opdef.c b/io_uring/opdef.c
>>> index 5238ecd7af6a..91e8d8f96134 100644
>>> --- a/io_uring/opdef.c
>>> +++ b/io_uring/opdef.c
>>> @@ -427,6 +427,31 @@ const struct io_issue_def io_issue_defs[] = {
>>>  		.prep			= io_eopnotsupp_prep,
>>>  #endif
>>>  	},
>>> +	[IORING_OP_READ_SPLICE_BUF] = {
>>> +		.needs_file		= 1,
>>> +		.unbound_nonreg_file	= 1,
>>> +		.pollin			= 1,
>>> +		.plug			= 1,
>>> +		.audit_skip		= 1,
>>> +		.ioprio			= 1,
>>> +		.iopoll			= 1,
>>> +		.iopoll_queue		= 1,
>>> +		.prep			= io_prep_rw,
>>> +		.issue			= io_read,
>>> +	},
>>> +	[IORING_OP_WRITE_SPLICE_BUF] = {
>>> +		.needs_file		= 1,
>>> +		.hash_reg_file		= 1,
>>> +		.unbound_nonreg_file	= 1,
>>> +		.pollout		= 1,
>>> +		.plug			= 1,
>>> +		.audit_skip		= 1,
>>> +		.ioprio			= 1,
>>> +		.iopoll			= 1,
>>> +		.iopoll_queue		= 1,
>>> +		.prep			= io_prep_rw,
>>> +		.issue			= io_write,
>>> +	},
>>
>> Are these really safe with iopoll?
> 
> Yeah, after the buffer is built, the handling is basically
> same with IORING_OP_WRITE_FIXED, so I think it is safe.

Yeah, on a second look, as these are just using the normal read/write
path after that should be fine indeed.

>>
>>> +static int io_prep_rw_splice_buf(struct io_kiocb *req,
>>> +				 const struct io_uring_sqe *sqe)
>>> +{
>>> +	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
>>> +	unsigned nr_pages = io_rw_splice_buf_nr_bvecs(rw->len);
>>> +	loff_t splice_off = READ_ONCE(sqe->splice_off_in);
>>> +	struct io_rw_splice_buf_data data;
>>> +	struct io_mapped_ubuf *imu;
>>> +	struct fd splice_fd;
>>> +	int ret;
>>> +
>>> +	splice_fd = fdget(READ_ONCE(sqe->splice_fd_in));
>>> +	if (!splice_fd.file)
>>> +		return -EBADF;
>>
>> Seems like this should check for SPLICE_F_FD_IN_FIXED, and also use
>> io_file_get_normal() for the non-fixed case in case someone passed in an
>> io_uring fd.
> 
> SPLICE_F_FD_IN_FIXED needs one extra word for holding splice flags, if
> we can use sqe->addr3, I think it is doable.

I haven't checked the rest, but you can't just use ->splice_flags for
this?

In any case, the get path needs to look like io_tee() here, and:

>>> +out_put_fd:
>>> +	if (splice_fd.file)
>>> +		fdput(splice_fd);

this put needs to be gated on whether it's a fixed file or not.

>> If the operation is done, clear NEED_CLEANUP and do the cleanup here?
>> That'll be faster.
> 
> The buffer has to be cleaned up after req is completed, since bvec
> table is needed for bio, and page reference need to be dropped after
> IO is done too.

I mean when you clear that flag, call the cleanup bits you otherwise
would've called on later cleanup.

-- 
Jens Axboe




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux