On Mon, Apr 4, 2022 at 12:46 PM Christoph Hellwig <hch@xxxxxx> wrote: > > Cann we plese spell out instastructure here? Or did you mean > infraread anyway :) :-) sure, I see the problem with this shorthand now. > > -enum io_uring_cmd_flags { > > - IO_URING_F_COMPLETE_DEFER = 1, > > - IO_URING_F_UNLOCKED = 2, > > - /* int's last bit, sign checks are usually faster than a bit test */ > > - IO_URING_F_NONBLOCK = INT_MIN, > > -}; > > This doesn't actually get used anywhere outside of io_uring.c, so why > move it? This got missed out. We are going to use it for things like this in nvme (from past series): + if (ioucmd && (ioucmd->flags & IO_URING_F_NONBLOCK)) { + rq_flags |= REQ_NOWAIT; + blk_flags |= BLK_MQ_REQ_NOWAIT; + } + req = nvme_alloc_request(q, cmd, blk_flags, rq_flags); Also for polling too. Will fix this up in the next version. > > +static void io_uring_cmd_work(struct io_kiocb *req, bool *locked) > > +{ > > + req->uring_cmd.driver_cb(&req->uring_cmd); > > +} > > + > > +void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, > > + void (*driver_cb)(struct io_uring_cmd *)) > > +{ > > + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd); > > + > > + req->uring_cmd.driver_cb = driver_cb; > > + req->io_task_work.func = io_uring_cmd_work; > > + io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOLL)); > > +} > > +EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task); > > I'm still not a fund of the double indirect call here. I don't really > have a good idea yet, but I plan to look into it. > > > static void io_req_task_queue_fail(struct io_kiocb *req, int ret) > > Also it would be great to not add it between io_req_task_queue_fail and > the callback set by it. Right. Will change. > > +void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret) > > +{ > > + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd); > > + > > + if (ret < 0) > > + req_set_fail(req); > > + io_req_complete(req, ret); > > +} > > +EXPORT_SYMBOL_GPL(io_uring_cmd_done); > > It seems like all callers of io_req_complete actually call req_set_fail > on failure. So maybe it would be nice pre-cleanup to handle the > req_set_fail call from ĩo_req_complete? io_tee and io_splice do slightly different handling. If we take this route, it seems they can use __io_req_complete() instead. > > + /* queued async, consumer will call io_uring_cmd_done() when complete */ > > + if (ret == -EIOCBQUEUED) > > + return 0; > > + io_uring_cmd_done(ioucmd, ret); > > Why not: > if (ret != -EIOCBQUEUED) > io_uring_cmd_done(ioucmd, ret); > return 0; > > That being said I wonder why not just remove the retun value from > ->async_cmd entirely and just require the implementation to always call > io_uring_cmd_done? That removes the confusion on who needs to call it > entirely, similarly to what we do in the block layer for ->submit_bio. Right, this seems doable at this point as we do not do any special handling (in io_uring) on the return code. > > +struct io_uring_cmd { > > + struct file *file; > > + void *cmd; > > + /* for irq-completion - if driver requires doing stuff in task-context*/ > > + void (*driver_cb)(struct io_uring_cmd *cmd); > > + u32 flags; > > + u32 cmd_op; > > + u16 cmd_len; > > The cmd_len field does not seem to actually be used anywhere. Another stuff that got left out from the previous series :-( Using this field for a bit of sanity checking at the moment. Like this in nvme: + if (ioucmd->cmd_len != sizeof(struct nvme_passthru_cmd64)) + return -EINVAL; + cptr = (struct nvme_passthru_cmd64 *)ioucmd->cmd; > > +++ b/include/uapi/linux/io_uring.h > > @@ -22,10 +22,12 @@ struct io_uring_sqe { > > union { > > __u64 off; /* offset into file */ > > __u64 addr2; > > + __u32 cmd_op; > > }; > > union { > > __u64 addr; /* pointer to buffer or iovecs */ > > __u64 splice_off_in; > > + __u16 cmd_len; > > }; > > __u32 len; /* buffer size or number of iovecs */ > > union { > > @@ -60,7 +62,10 @@ struct io_uring_sqe { > > __s32 splice_fd_in; > > __u32 file_index; > > }; > > - __u64 __pad2[2]; > > + union { > > + __u64 __pad2[2]; > > + __u64 cmd; > > + }; > > Can someone explain these changes to me a little more? All these three (cmd_op, cmd_len and cmd) are operation specific fields. user-space supplies these into SQE, io_uring packages those into "struct io_uring_cmd" and pass that down to the provider for doing the real processing. 1. cmd_op = operation code for async cmd (e.g. passthru ioctl opcode or whatever else we want to turn async from user-space) 2. cmd_len = actual length of async command (e.g. we have max 80 bytes with big-sqe and for nvme-passthru we use the max, but for some other usecase one can go with smaller length) 3. cmd = this is the starting-offset where async-command is placed (by user-space) inside the big-sqe. 16 bytes in first-sqe, and next 64 bytes comes from second-sqe. And if we were doing pointer-based command submission, this "cmd" would have pointed to user-space command (of whatever length). Just to put the entire thought-process across; I understand that we are not taking that route.