On 2/11/23 9:12?AM, Ming Lei wrote: > On Sat, Feb 11, 2023 at 08:45:18AM -0700, Jens Axboe wrote: >> On 2/10/23 8:32?AM, Ming Lei wrote: >>> IORING_OP_READ_SPLICE_BUF: read to buffer which is built from >>> ->read_splice() of specified fd, so user needs to provide (splice_fd, offset, len) >>> for building buffer. >>> >>> IORING_OP_WRITE_SPLICE_BUF: write from buffer which is built from >>> ->read_splice() of specified fd, so user needs to provide (splice_fd, offset, len) >>> for building buffer. >>> >>> The typical use case is for supporting ublk/fuse io_uring zero copy, >>> and READ/WRITE OP retrieves ublk/fuse request buffer via direct pipe >>> from device->read_splice(), then READ/WRITE can be done to/from this >>> buffer directly. >> >> Main question here - would this be better not plumbed up through the rw >> path? Might be cleaner, even if it either requires a bit of helper >> refactoring or accepting a bit of duplication. But would still be better >> than polluting the rw fast path imho. > > The buffer is actually IO buffer, which has to be plumbed up in IO path, > and it can't be done like the registered buffer. > > The only affect on fast path is : > > if (io_rw_splice_buf(req)) //which just check opcode > return io_prep_rw_splice_buf(req, sqe); > > and the cleanup code which is only done for the two new OPs. > > Or maybe I misunderstand your point? Or any detailed suggestion? > > Actually the code should be factored into generic helper, since net.c > need to use them too. Probably it needs to move to rsrc.c? Yep, just refactoring out those bits as a prep thing. rsrc could work, or perhaps a new file for that. >> Also seems like this should be separately testable. We can't add new >> opcodes that don't have a feature test at least, and should also have >> various corner case tests. A bit of commenting outside of this below. > > OK, I will write/add one very simple ublk userspace to liburing for > test purpose. Thanks! >>> diff --git a/io_uring/opdef.c b/io_uring/opdef.c >>> index 5238ecd7af6a..91e8d8f96134 100644 >>> --- a/io_uring/opdef.c >>> +++ b/io_uring/opdef.c >>> @@ -427,6 +427,31 @@ const struct io_issue_def io_issue_defs[] = { >>> .prep = io_eopnotsupp_prep, >>> #endif >>> }, >>> + [IORING_OP_READ_SPLICE_BUF] = { >>> + .needs_file = 1, >>> + .unbound_nonreg_file = 1, >>> + .pollin = 1, >>> + .plug = 1, >>> + .audit_skip = 1, >>> + .ioprio = 1, >>> + .iopoll = 1, >>> + .iopoll_queue = 1, >>> + .prep = io_prep_rw, >>> + .issue = io_read, >>> + }, >>> + [IORING_OP_WRITE_SPLICE_BUF] = { >>> + .needs_file = 1, >>> + .hash_reg_file = 1, >>> + .unbound_nonreg_file = 1, >>> + .pollout = 1, >>> + .plug = 1, >>> + .audit_skip = 1, >>> + .ioprio = 1, >>> + .iopoll = 1, >>> + .iopoll_queue = 1, >>> + .prep = io_prep_rw, >>> + .issue = io_write, >>> + }, >> >> Are these really safe with iopoll? > > Yeah, after the buffer is built, the handling is basically > same with IORING_OP_WRITE_FIXED, so I think it is safe. Yeah, on a second look, as these are just using the normal read/write path after that should be fine indeed. >> >>> +static int io_prep_rw_splice_buf(struct io_kiocb *req, >>> + const struct io_uring_sqe *sqe) >>> +{ >>> + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); >>> + unsigned nr_pages = io_rw_splice_buf_nr_bvecs(rw->len); >>> + loff_t splice_off = READ_ONCE(sqe->splice_off_in); >>> + struct io_rw_splice_buf_data data; >>> + struct io_mapped_ubuf *imu; >>> + struct fd splice_fd; >>> + int ret; >>> + >>> + splice_fd = fdget(READ_ONCE(sqe->splice_fd_in)); >>> + if (!splice_fd.file) >>> + return -EBADF; >> >> Seems like this should check for SPLICE_F_FD_IN_FIXED, and also use >> io_file_get_normal() for the non-fixed case in case someone passed in an >> io_uring fd. > > SPLICE_F_FD_IN_FIXED needs one extra word for holding splice flags, if > we can use sqe->addr3, I think it is doable. I haven't checked the rest, but you can't just use ->splice_flags for this? In any case, the get path needs to look like io_tee() here, and: >>> +out_put_fd: >>> + if (splice_fd.file) >>> + fdput(splice_fd); this put needs to be gated on whether it's a fixed file or not. >> If the operation is done, clear NEED_CLEANUP and do the cleanup here? >> That'll be faster. > > The buffer has to be cleaned up after req is completed, since bvec > table is needed for bio, and page reference need to be dropped after > IO is done too. I mean when you clear that flag, call the cleanup bits you otherwise would've called on later cleanup. -- Jens Axboe