On 1/27/21 5:38 PM, Darrick J. Wong wrote: > On Wed, Jan 27, 2021 at 02:25:38PM -0700, Jens Axboe wrote: >> This is a file private kind of request. io_uring doesn't know what's >> in this command type, it's for the file_operations->uring_cmd() >> handler to deal with. >> >> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >> --- >> fs/io_uring.c | 59 +++++++++++++++++++++++++++++++++++ >> include/linux/io_uring.h | 12 +++++++ >> include/uapi/linux/io_uring.h | 1 + >> 3 files changed, 72 insertions(+) >> >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 03748faa5295..55c2714a591e 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -712,6 +712,7 @@ struct io_kiocb { >> struct io_shutdown shutdown; >> struct io_rename rename; >> struct io_unlink unlink; >> + struct io_uring_cmd uring_cmd; >> /* use only after cleaning per-op data, see io_clean_op() */ >> struct io_completion compl; >> }; >> @@ -805,6 +806,8 @@ struct io_op_def { >> unsigned needs_async_data : 1; >> /* should block plug */ >> unsigned plug : 1; >> + /* doesn't support personality */ >> + unsigned no_personality : 1; >> /* size of async data needed, if any */ >> unsigned short async_size; >> unsigned work_flags; >> @@ -998,6 +1001,11 @@ static const struct io_op_def io_op_defs[] = { >> .work_flags = IO_WQ_WORK_MM | IO_WQ_WORK_FILES | >> IO_WQ_WORK_FS | IO_WQ_WORK_BLKCG, >> }, >> + [IORING_OP_URING_CMD] = { >> + .needs_file = 1, >> + .no_personality = 1, >> + .work_flags = IO_WQ_WORK_MM | IO_WQ_WORK_BLKCG, >> + }, >> }; >> >> enum io_mem_account { >> @@ -3797,6 +3805,47 @@ static int io_unlinkat(struct io_kiocb *req, bool force_nonblock) >> return 0; >> } >> >> +static void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret) >> +{ >> + struct io_kiocb *req = container_of(cmd, struct io_kiocb, uring_cmd); >> + >> + if (ret < 0) >> + req_set_fail_links(req); >> + io_req_complete(req, ret); >> +} >> + >> +static int io_uring_cmd_prep(struct io_kiocb *req, >> + const struct io_uring_sqe *sqe) >> +{ >> + struct io_uring_cmd *cmd = &req->uring_cmd; >> + >> + if (!req->file->f_op->uring_cmd) >> + return -EOPNOTSUPP; >> + >> + memcpy(&cmd->pdu, (void *) &sqe->off, sizeof(cmd->pdu)); > > Hmmm. struct io_uring_pdu is (by my count) 6x uint64_t (==48 bytes) in > size. This starts copying the pdu from byte 8 in struct io_uring_sqe, > and the sqe is 64 bytes in size. Correct > I guess (having not played much with io_uring) that the stuff in the > first eight bytes of the sqe are header info that's common to all > io_uring operations, and hence not passed to io_uring_cmd*. Exactly > Assuming that I got that right, that means that the pdu information > doesn't actually go all the way to the end of the sqe, which currently > is just a bunch of padding. Was that intentional, or does this mean > that io_uring_pdu could actually be 8 bytes longer? Also correct. The reason is actually kind of stupid, and I think we should just fix that up. struct io_uring_cmd should fit within the first cacheline of io_kiocb, to avoid bloating that one. But with the members in there, it ends up being 8 bytes too big, if we grab those 8 bytes. What I think we should do is get rid of ->done, and just have drivers call io_uring_cmd_done() instead. We can provide an empty hook for that. Then we can reclaim the 8 bytes, and grow the io_uring_cmd to 56 bytes. > Also, I thought io_uring_seq.user_data was supposed to coincide with > io_uring_pdu.reserved? They don't seem to...? > > (I could be totally off here, fwiw.) I think you are, I even added a BUILD check for that: BUILD_BUG_ON(offsetof(struct io_uring_sqe, user_data) != offsetof(struct io_uring_pdu, reserved)); to ensure that that is the case. > The reason why I'm counting bytes so stingily is that xfs_scrub issues > millions upon millions of ioctl calls to scrub an XFS. Wouldn't it be > nice if there was a way to submit a single userspace buffer to the > kernel and let it run every scrubber for that fs object in order? I > could cram all that data into the pdu struct ... if it had 56 bytes of > space. For other purposes too, the bigger we can make the inline data, the more likely we are that we can fit everything we need in there. I'm going to make the change to bump it to 56 bytes. > If not, it wouldn't be a big deal to use one of the data[4] fields as a > pointer to a larger struct, but where's the fun in that? :) Agree :-) > Granted I'm programming speculatively in my head, not building an actual > prototype. There are all kinds of other questions I have, like, can a > uring command handler access the task struct or the userspace memory of > the process it was called from? What happens when the user is madly > pounding on ^C while uring commands are running? I should probably > figure out the answers to those questions and maybe even write/crib a > program first... Well, either the program would exit if it had no SIGINT handler, and that would signal the async task handling it and cancel it. Or you handle it, and then you need to cancel on your own. -- Jens Axboe