On Thu, Mar 10, 2022 at 7:49 PM Christoph Hellwig <hch@xxxxxx> wrote: > > On Thu, Mar 10, 2022 at 05:20:13PM +0530, Kanchan Joshi wrote: > > In sync ioctl, we always update this result field by doing put_user on > > completion. > > For async ioctl, since command is inside the the sqe, its lifetime is > > only upto submission. SQE may get reused post submission, leaving no > > way to update the "result" field on completion. Had this field been a > > pointer, we could have saved this on submission and updated on > > completion. But that would require redesigning this structure and > > adding newer ioctl in nvme. > > Why would it required adding an ioctl to nvme? The whole io_uring > async_cmd infrastructure is completely independent from ioctls. io_uring is sure not peeking into ioctl and its command-structure but offering the facility to use its sqe to store that ioctl-command inline. Problem is, the inline facility does not go very well with this particular nvme-passthru ioctl (NVME_IOCTL_IO64_CMD). And that's because this ioctl requires additional "__u64 result;" to be updated within "struct nvme_passthru_cmd64". To update that during completion, we need, at the least, the result field to be a pointer "__u64 result_ptr" inside the struct nvme_passthru_cmd64. Do you see that is possible without adding a new passthru ioctl in nvme? > > Coming back, even though sync-ioctl alway updates this result to > > user-space, only a few nvme io commands (e.g. zone-append, copy, > > zone-mgmt-send) can return this additional result (spec-wise). > > Therefore in nvme, when we are dealing with inline-sqe commands from > > io_uring, we never attempt to update the result. And since we don't > > update the result, we limit support to only read/write passthru > > commands. And fail any other command during submission itself (Patch > > 2). > > Yikes. That is outright horrible. passthrough needs to be command > agnostic and future proof to any newly added nvme command. This patch (along with patch 16) does exactly that. Makes it command-agnostic and future-proof. All nvme-commands will work with it. Just that application needs to pass the pointer of ioctl-command and not place it inline inside the sqe. Overall, I think at io_uring infra level both submission makes sense: big-sqe based inline submission (more efficient for <= 80 bytes) and normal-sqe based non-inline/indirect submissions. At nvme-level, we have to pick (depending on ioctl in hand). Currently we are playing with both and constructing a sort of fast-path (for all commands) and another faster-path (only for read/write commands). Should we (at nvme-level) rather opt out and use only indirect (because it works for all commands) or must we build a way to enable inline-one for all commands? > > > Overly long line. > > > > Under 100, but sure, can fold it under 80. > > You can only use 100 sparingly if it makes the code more readable. Which > I know is fuzzy, and in practice never does. Certainly not in nvme and > block code. Clears up, thanks. -- Kanchan