On 12/11/24 21:01, Jens Axboe wrote: > > > On 11/12/24 9:29 AM, Mark Harmstone wrote: >> Add an io_uring interface for encoded writes, with the same parameters >> as the BTRFS_IOC_ENCODED_WRITE ioctl. >> >> As with the encoded reads code, there's a test program for this at >> https://github.com/maharmstone/io_uring-encoded, and I'll get this >> worked into an fstest. >> >> How io_uring works is that it initially calls btrfs_uring_cmd with the >> IO_URING_F_NONBLOCK flag set, and if we return -EAGAIN it tries again in >> a kthread with the flag cleared. > ^^^^^^^^ > > Not a kernel thread, it's an io worker. The distinction may seem > irrelevant, but it's really not - io workers inherit all the properties > of the original task. > >> Ideally we'd honour this and call try_lock etc., but there's still a lot >> of work to be done to create non-blocking versions of all the functions >> in our write path. Instead, just validate the input in >> btrfs_uring_encoded_write() on the first pass and return -EAGAIN, with a >> view to properly optimizing the happy path later on. > > But you need to ensure stable state after the first issue, regardless of > how you handle it. I don't have the other patches handy, but whatever > you copy from userspace before you return -EAGAIN, you should not be > copying again. By the time you get the 2nd invocation from io-wq, no > copying should be taking place, you should be using the state you > already ensured was stable for the non-blocking issue. > > Maybe this is all handled by the caller of btrfs_uring_encoded_write() > already? As far as looking at the code below, it just looks like it > copies everything, then returns -EAGAIN, then copies it again later? Yes > uring_cmd will make the sqe itself stable, but: > > sqe_addr = u64_to_user_ptr(READ_ONCE(cmd->sqe->addr)); > > the userspace btrfs_ioctl_encoded_io_args that sqe->addr points too > should remain stable as well. If not, consider userspace doing: > > some_func() > { > struct btrfs_ioctl_encoded_io_args args; > > fill_in_args(&args); > sqe = io_uring_get_sqe(ring); > sqe->addr = &args; > io_uring_submit(); <- initial invocation here > } > > main_func() > { > some_func(); > - io-wq invocation perhaps here > wait_on_cqes(); > } > > where io-wq will be reading garbage as args went out of scope, unless > some_func() used a stable/heap struct that isn't freed until completion. > some_func() can obviously wait on the cqe, but at that point you'd be > using it as a sync interface, and there's little point. > > This is why io_kiocb->async_data exists. uring_cmd is already using that > for the sqe, I think you'd want to add a 2nd "void *op_data" or > something in there, and have the uring_cmd alloc cache get clear that to > NULL and have uring_cmd alloc cache put kfree() it if it's non-NULL. > > We'd also need to move the uring_cache struct into > include/linux/io_uring_types.h so that btrfs can get to it (and probably > rename it to something saner, uring_cmd_async_data for example). > > static int btrfs_uring_encoded_write(struct io_uring_cmd *cmd, unsigned int issue_flags) > { > struct io_kiocb *req = cmd_to_io_kiocb(cmd); > struct uring_cmd_async_data *data = req->async_data; > struct btrfs_ioctl_encoded_io_args *args; > > if (!data->op_data) { > data->op_data = kmalloc(sizeof(*args), GFP_NOIO); > if (!data->op_data) > return -ENOMEM; > if (copy_from_user(data->op_data, sqe_addr, sizeof(*args)) > return -EFAULT; > } > ... > } > > and have it be stable, then moving your copying into a helper rather > than inline in btrfs_uring_encoded_write() (it probably should be > regardless). Ignored the compat above, it's just pseudo code. > > Anyway, hope that helps. I'll be happy to do the uring_cmd bit for you, > but it really should be pretty straight forward. > > I'm also pondering if the encoded read side suffers from the same issue? > Thanks Jens, that makes sense to me. Mark