On 2/20/21 7:50 AM, Jens Axboe wrote: > On 2/19/21 8:57 PM, Stefan Metzmacher wrote: >> Hi Jens, >> >> Am 28.01.21 um 03:19 schrieb Jens Axboe: >>>>> Assuming that I got that right, that means that the pdu information >>>>> doesn't actually go all the way to the end of the sqe, which currently >>>>> is just a bunch of padding. Was that intentional, or does this mean >>>>> that io_uring_pdu could actually be 8 bytes longer? >>>> >>>> Also correct. The reason is actually kind of stupid, and I think we >>>> should just fix that up. struct io_uring_cmd should fit within the first >>>> cacheline of io_kiocb, to avoid bloating that one. But with the members >>>> in there, it ends up being 8 bytes too big, if we grab those 8 bytes. >>>> What I think we should do is get rid of ->done, and just have drivers >>>> call io_uring_cmd_done() instead. We can provide an empty hook for that. >>>> Then we can reclaim the 8 bytes, and grow the io_uring_cmd to 56 bytes. >>> >>> Pushed out that version: >>> >>> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-fops.v2 >>> >>> which gives you the full 56 bytes for the payload command. >> >> I think we only have 48 bytes for the payload. > > You are right, it's 64b minus 8 for the head, and 8 for user_data. > >> I've rebased and improved your io_uring-fops.v2 on top of your io_uring-worker.v3. > > Heh, I did that myself yesterday too, when I was folding in two fixes! > >> See https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/io_uring-fops > > Had a quick look, and some good stuff in there. So thanks for that. One > question, though - if you look at mine, you'll see I moved the force_nonblock > to be using the issue_flags instead. You dropped that from the issue path, > we definitely need that to be able to punt the request if we're in the > nonblock/fast path. > >> >> I've changed the layout like this: >> >> struct io_uring_sqe { >> __u8 opcode; /* type of operation for this sqe */ >> __u8 flags; /* IOSQE_ flags */ >> union { >> __u16 ioprio; /* ioprio for the request */ >> __u16 cmd_personality; /* IORING_OP_URING_CMD */ >> }; >> __s32 fd; /* file descriptor to do IO on */ >> union { >> __u64 off; /* offset into file */ >> __u64 addr2; >> __u64 cmd_user_data; /* IORING_OP_URING_CMD: data to be passed back at completion time */ >> }; >> union { >> __u64 addr; /* pointer to buffer or iovecs */ >> __u64 splice_off_in; >> __u64 cmd_pdu_start; /* IORING_OP_URING_CMD: this is the start for the remaining 48 bytes */ >> }; >> >> And then use: >> >> struct io_uring_cmd_pdu { >> __u64 data[6]; /* 48 bytes available for free use */ >> }; >> >> So we effectively have this: >> >> struct io_uring_cmd_sqe { >> __u8 opcode; /* type of operation for this sqe */ >> __u8 flags; /* IOSQE_ flags */ >> __u16 cmd_personality; /* IORING_OP_URING_CMD */ >> __s32 fd; /* file descriptor to do IO on */ >> __u64 cmd_user_data; /* IORING_OP_URING_CMD: data to be passed back at completion time */ >> union { >> __u64 cmd_pdu_start; /* IORING_OP_URING_CMD: this is the start for the remaining 48 bytes */ >> struct io_uring_cmd_pdu cmd_pdu; >> }; >> } >> >> I think it's saner to have a complete block of 48 bytes available for the payload >> and move personality and user_data to to top if opcode is IORING_OP_URING_CMD >> instead of having a hole that can't be touched. >> >> I also finished the socket glue from struct file -> struct socket -> struct sock >> >> I think it compiles, but I haven't done any tests. >> >> What do you think? > > I've been thinking along the same lines, because having a sparse sqe layout > for the uring cmd is a pain. I do think 'personality' is a bit too specific > to be part of the shared space, that should probably belong in the pdu > instead if the user needs it. One thing they all have in common is that they'd > need a sub-command, so why not make that u16 that? > > There's also the option of simply saying that the uring_cmd sqe is just > a different type, ala: > > struct io_uring_cmd_sqe { > __u8 opcode; /* IO_OP_URING_CMD */ > __u8 flags; > __u16 target_op; > __s32 fd; > __u64 user_data; > strut io_uring_cmd_pdu cmd_pdu; > }; > > which is essentially the same as your suggestion in terms of layout > (because that is the one that makes the most sense), we just don't try > and shoe-horn it into the existing sqe. As long as we overlap > opcode/flags, then init is fine. And past init, sqe is already consumed. > > Haven't tried and wire that up yet, and it may just be that the simple > layout change you did is just easier to deal with. The important part > here is the layout, and I certainly think we should do that. There's > effectively 54 bytes of data there, if you include the target op and fd > as part of that space. 48 fully usable for whatever. OK, folded in some of your stuff, and pushed out a new branch. Find it here: https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-fops.v3 I did notice while doing so that you put the issue flags in the cmd, I've made them external again. Just seems cleaner to me, otherwise you'd have to modify the command for reissue rather than just pass in the flags directly. I also retained struct file * in the cmd - that's a requirement for the layout of io_kiocb, so might as well keep it in there and not pass in the file. Plus that one won't ever change... Since we just need that one branch in req init, I do think that your suggestion of just modifying io_uring_sqe is the way to go. So that's what the above branch does. I tested the block side, and it works for getting the bs of the device. That's all the testing that has been done so far :-) Comments welcome! Would like to move this one forward and hopefully target 5.13 for it. -- Jens Axboe