Re: [PATCH 2/5] io_uring: add support for IORING_OP_URING_CMD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/19/21 8:57 PM, Stefan Metzmacher wrote:
> Hi Jens,
> 
> Am 28.01.21 um 03:19 schrieb Jens Axboe:
>>>> Assuming that I got that right, that means that the pdu information
>>>> doesn't actually go all the way to the end of the sqe, which currently
>>>> is just a bunch of padding.  Was that intentional, or does this mean
>>>> that io_uring_pdu could actually be 8 bytes longer?
>>>
>>> Also correct. The reason is actually kind of stupid, and I think we
>>> should just fix that up. struct io_uring_cmd should fit within the first
>>> cacheline of io_kiocb, to avoid bloating that one. But with the members
>>> in there, it ends up being 8 bytes too big, if we grab those 8 bytes.
>>> What I think we should do is get rid of ->done, and just have drivers
>>> call io_uring_cmd_done() instead. We can provide an empty hook for that.
>>> Then we can reclaim the 8 bytes, and grow the io_uring_cmd to 56 bytes.
>>
>> Pushed out that version:
>>
>> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-fops.v2
>>
>> which gives you the full 56 bytes for the payload command.
> 
> I think we only have 48 bytes for the payload.

You are right, it's 64b minus 8 for the head, and 8 for user_data.

> I've rebased and improved your io_uring-fops.v2 on top of your io_uring-worker.v3.

Heh, I did that myself yesterday too, when I was folding in two fixes!

> See https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/io_uring-fops

Had a quick look, and some good stuff in there. So thanks for that. One
question, though - if you look at mine, you'll see I moved the force_nonblock
to be using the issue_flags instead. You dropped that from the issue path,
we definitely need that to be able to punt the request if we're in the
nonblock/fast path.

> 
> I've changed the layout like this:
> 
> struct io_uring_sqe {
>         __u8    opcode;         /* type of operation for this sqe */
>         __u8    flags;          /* IOSQE_ flags */
>         union {
>                 __u16   ioprio;         /* ioprio for the request */
>                 __u16   cmd_personality; /* IORING_OP_URING_CMD */
>         };
>         __s32   fd;             /* file descriptor to do IO on */
>         union {
>                 __u64   off;    /* offset into file */
>                 __u64   addr2;
>                 __u64   cmd_user_data; /* IORING_OP_URING_CMD: data to be passed back at completion time */
>         };
>         union {
>                 __u64   addr;   /* pointer to buffer or iovecs */
>                 __u64   splice_off_in;
>                 __u64   cmd_pdu_start; /* IORING_OP_URING_CMD: this is the start for the remaining 48 bytes */
>         };
> 
> And then use:
> 
> struct io_uring_cmd_pdu {
>        __u64 data[6]; /* 48 bytes available for free use */
> };
> 
> So we effectively have this:
> 
> struct io_uring_cmd_sqe {
>         __u8    opcode;         /* type of operation for this sqe */
>         __u8    flags;          /* IOSQE_ flags */
>         __u16   cmd_personality; /* IORING_OP_URING_CMD */
>         __s32   fd;             /* file descriptor to do IO on */
>         __u64   cmd_user_data; /* IORING_OP_URING_CMD: data to be passed back at completion time */
>         union {
>                 __u64   cmd_pdu_start; /* IORING_OP_URING_CMD: this is the start for the remaining 48 bytes */
>                 struct io_uring_cmd_pdu cmd_pdu;
>         };
> }
> 
> I think it's saner to have a complete block of 48 bytes available for the payload
> and move personality and user_data to to top if opcode is IORING_OP_URING_CMD
> instead of having a hole that can't be touched.
> 
> I also finished the socket glue from struct file -> struct socket -> struct sock
> 
> I think it compiles, but I haven't done any tests.
> 
> What do you think?

I've been thinking along the same lines, because having a sparse sqe layout
for the uring cmd is a pain. I do think 'personality' is a bit too specific
to be part of the shared space, that should probably belong in the pdu
instead if the user needs it. One thing they all have in common is that they'd
need a sub-command, so why not make that u16 that?

There's also the option of simply saying that the uring_cmd sqe is just
a different type, ala:

struct io_uring_cmd_sqe {
	__u8	opcode;		/* IO_OP_URING_CMD */
	__u8	flags;
	__u16	target_op;
	__s32	fd;
	__u64	user_data;
	strut io_uring_cmd_pdu cmd_pdu;
};

which is essentially the same as your suggestion in terms of layout
(because that is the one that makes the most sense), we just don't try
and shoe-horn it into the existing sqe. As long as we overlap
opcode/flags, then init is fine. And past init, sqe is already consumed.

Haven't tried and wire that up yet, and it may just be that the simple
layout change you did is just easier to deal with. The important part
here is the layout, and I certainly think we should do that. There's
effectively 54 bytes of data there, if you include the target op and fd
as part of that space. 48 fully usable for whatever.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux