Re: [PATCH] btrfs: add io_uring interface for encoded writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/12/24 9:29 AM, Mark Harmstone wrote:
> Add an io_uring interface for encoded writes, with the same parameters
> as the BTRFS_IOC_ENCODED_WRITE ioctl.
> 
> As with the encoded reads code, there's a test program for this at
> https://github.com/maharmstone/io_uring-encoded, and I'll get this
> worked into an fstest.
> 
> How io_uring works is that it initially calls btrfs_uring_cmd with the
> IO_URING_F_NONBLOCK flag set, and if we return -EAGAIN it tries again in
> a kthread with the flag cleared.
    ^^^^^^^^

Not a kernel thread, it's an io worker. The distinction may seem
irrelevant, but it's really not - io workers inherit all the properties
of the original task.

> Ideally we'd honour this and call try_lock etc., but there's still a lot
> of work to be done to create non-blocking versions of all the functions
> in our write path. Instead, just validate the input in
> btrfs_uring_encoded_write() on the first pass and return -EAGAIN, with a
> view to properly optimizing the happy path later on.

But you need to ensure stable state after the first issue, regardless of
how you handle it. I don't have the other patches handy, but whatever
you copy from userspace before you return -EAGAIN, you should not be
copying again. By the time you get the 2nd invocation from io-wq, no
copying should be taking place, you should be using the state you
already ensured was stable for the non-blocking issue.

Maybe this is all handled by the caller of btrfs_uring_encoded_write()
already? As far as looking at the code below, it just looks like it
copies everything, then returns -EAGAIN, then copies it again later? Yes
uring_cmd will make the sqe itself stable, but:

	sqe_addr = u64_to_user_ptr(READ_ONCE(cmd->sqe->addr));

the userspace btrfs_ioctl_encoded_io_args that sqe->addr points too
should remain stable as well. If not, consider userspace doing:

some_func()
{
	struct btrfs_ioctl_encoded_io_args args;

	fill_in_args(&args);
	sqe = io_uring_get_sqe(ring);
	sqe->addr = &args;
	io_uring_submit();		<- initial invocation here
}

main_func()
{
	some_func();
				- io-wq invocation perhaps here
	wait_on_cqes();
}

where io-wq will be reading garbage as args went out of scope, unless
some_func() used a stable/heap struct that isn't freed until completion.
some_func() can obviously wait on the cqe, but at that point you'd be
using it as a sync interface, and there's little point.

This is why io_kiocb->async_data exists. uring_cmd is already using that
for the sqe, I think you'd want to add a 2nd "void *op_data" or
something in there, and have the uring_cmd alloc cache get clear that to
NULL and have uring_cmd alloc cache put kfree() it if it's non-NULL.

We'd also need to move the uring_cache struct into
include/linux/io_uring_types.h so that btrfs can get to it (and probably
rename it to something saner, uring_cmd_async_data for example).

static int btrfs_uring_encoded_write(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
	struct io_kiocb *req = cmd_to_io_kiocb(cmd);
	struct uring_cmd_async_data *data = req->async_data;
	struct btrfs_ioctl_encoded_io_args *args;

	if (!data->op_data) {
		data->op_data = kmalloc(sizeof(*args), GFP_NOIO);
		if (!data->op_data)
			return -ENOMEM;
		if (copy_from_user(data->op_data, sqe_addr, sizeof(*args))
			return -EFAULT;
	}
	...
}

and have it be stable, then moving your copying into a helper rather
than inline in btrfs_uring_encoded_write() (it probably should be
regardless). Ignored the compat above, it's just pseudo code.

Anyway, hope that helps. I'll be happy to do the uring_cmd bit for you,
but it really should be pretty straight forward.

I'm also pondering if the encoded read side suffers from the same issue?

-- 
Jens Axboe




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux