Re: [PATCH 2/5] io_uring: add support for IORING_OP_URING_CMD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/27/21 5:38 PM, Darrick J. Wong wrote:
> On Wed, Jan 27, 2021 at 02:25:38PM -0700, Jens Axboe wrote:
>> This is a file private kind of request. io_uring doesn't know what's
>> in this command type, it's for the file_operations->uring_cmd()
>> handler to deal with.
>>
>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>> ---
>>  fs/io_uring.c                 | 59 +++++++++++++++++++++++++++++++++++
>>  include/linux/io_uring.h      | 12 +++++++
>>  include/uapi/linux/io_uring.h |  1 +
>>  3 files changed, 72 insertions(+)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 03748faa5295..55c2714a591e 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -712,6 +712,7 @@ struct io_kiocb {
>>  		struct io_shutdown	shutdown;
>>  		struct io_rename	rename;
>>  		struct io_unlink	unlink;
>> +		struct io_uring_cmd	uring_cmd;
>>  		/* use only after cleaning per-op data, see io_clean_op() */
>>  		struct io_completion	compl;
>>  	};
>> @@ -805,6 +806,8 @@ struct io_op_def {
>>  	unsigned		needs_async_data : 1;
>>  	/* should block plug */
>>  	unsigned		plug : 1;
>> +	/* doesn't support personality */
>> +	unsigned		no_personality : 1;
>>  	/* size of async data needed, if any */
>>  	unsigned short		async_size;
>>  	unsigned		work_flags;
>> @@ -998,6 +1001,11 @@ static const struct io_op_def io_op_defs[] = {
>>  		.work_flags		= IO_WQ_WORK_MM | IO_WQ_WORK_FILES |
>>  						IO_WQ_WORK_FS | IO_WQ_WORK_BLKCG,
>>  	},
>> +	[IORING_OP_URING_CMD] = {
>> +		.needs_file		= 1,
>> +		.no_personality		= 1,
>> +		.work_flags		= IO_WQ_WORK_MM | IO_WQ_WORK_BLKCG,
>> +	},
>>  };
>>  
>>  enum io_mem_account {
>> @@ -3797,6 +3805,47 @@ static int io_unlinkat(struct io_kiocb *req, bool force_nonblock)
>>  	return 0;
>>  }
>>  
>> +static void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret)
>> +{
>> +	struct io_kiocb *req = container_of(cmd, struct io_kiocb, uring_cmd);
>> +
>> +	if (ret < 0)
>> +		req_set_fail_links(req);
>> +	io_req_complete(req, ret);
>> +}
>> +
>> +static int io_uring_cmd_prep(struct io_kiocb *req,
>> +			     const struct io_uring_sqe *sqe)
>> +{
>> +	struct io_uring_cmd *cmd = &req->uring_cmd;
>> +
>> +	if (!req->file->f_op->uring_cmd)
>> +		return -EOPNOTSUPP;
>> +
>> +	memcpy(&cmd->pdu, (void *) &sqe->off, sizeof(cmd->pdu));
> 
> Hmmm.  struct io_uring_pdu is (by my count) 6x uint64_t (==48 bytes) in
> size.  This starts copying the pdu from byte 8 in struct io_uring_sqe,
> and the sqe is 64 bytes in size.

Correct

> I guess (having not played much with io_uring) that the stuff in the
> first eight bytes of the sqe are header info that's common to all
> io_uring operations, and hence not passed to io_uring_cmd*.

Exactly

> Assuming that I got that right, that means that the pdu information
> doesn't actually go all the way to the end of the sqe, which currently
> is just a bunch of padding.  Was that intentional, or does this mean
> that io_uring_pdu could actually be 8 bytes longer?

Also correct. The reason is actually kind of stupid, and I think we
should just fix that up. struct io_uring_cmd should fit within the first
cacheline of io_kiocb, to avoid bloating that one. But with the members
in there, it ends up being 8 bytes too big, if we grab those 8 bytes.
What I think we should do is get rid of ->done, and just have drivers
call io_uring_cmd_done() instead. We can provide an empty hook for that.
Then we can reclaim the 8 bytes, and grow the io_uring_cmd to 56 bytes.

> Also, I thought io_uring_seq.user_data was supposed to coincide with
> io_uring_pdu.reserved?  They don't seem to...?
>
> (I could be totally off here, fwiw.)

I think you are, I even added a BUILD check for that:

BUILD_BUG_ON(offsetof(struct io_uring_sqe, user_data) !=
	     offsetof(struct io_uring_pdu, reserved));

to ensure that that is the case.

> The reason why I'm counting bytes so stingily is that xfs_scrub issues
> millions upon millions of ioctl calls to scrub an XFS.  Wouldn't it be
> nice if there was a way to submit a single userspace buffer to the
> kernel and let it run every scrubber for that fs object in order?  I
> could cram all that data into the pdu struct ... if it had 56 bytes of
> space.

For other purposes too, the bigger we can make the inline data, the more
likely we are that we can fit everything we need in there. I'm going to
make the change to bump it to 56 bytes.

> If not, it wouldn't be a big deal to use one of the data[4] fields as a
> pointer to a larger struct, but where's the fun in that? :)

Agree :-)

> Granted I'm programming speculatively in my head, not building an actual
> prototype.  There are all kinds of other questions I have, like, can a
> uring command handler access the task struct or the userspace memory of
> the process it was called from?  What happens when the user is madly
> pounding on ^C while uring commands are running?  I should probably
> figure out the answers to those questions and maybe even write/crib a
> program first... 

Well, either the program would exit if it had no SIGINT handler, and
that would signal the async task handling it and cancel it. Or you
handle it, and then you need to cancel on your own.

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux