Hi, Discussed this with Pavel, and on his suggestion, I tried prototyping a "buffer update" opcode. Basically it works like IORING_REGISTER_BUFFERS_UPDATE in that it can update an existing buffer registration. But it works as an sqe rather than being a sync opcode. The idea here is that you could do that upfront, or as part of a chain, and have it be generically available, just like any other buffer that was registered upfront. You do need an empty table registered first, which can just be sparse. And since you can pick the slot it goes into, you can rely on that slot afterwards (either as a link, or just the following sqe). Quick'n dirty obviously, but I did write a quick test case too to verify that: 1) It actually works (it seems to) 2) It's not too slow (it seems not to be, I can get ~2.5M updates per second in a vm on my laptop, which isn't too bad). Not saying this is perfect, but perhaps it's worth entertaining an idea like that? It has the added benefit of being persistent across system calls as well, unless you do another IORING_OP_BUF_UPDATE at the end of your chain to re-set it. Comments? Could it be useful for this? diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 86cb385fe0b5..02d4b66267ef 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -259,6 +259,7 @@ enum io_uring_op { IORING_OP_FTRUNCATE, IORING_OP_BIND, IORING_OP_LISTEN, + IORING_OP_BUF_UPDATE, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/opdef.c b/io_uring/opdef.c index a2be3bbca5ff..cda35d22397d 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -515,6 +515,10 @@ const struct io_issue_def io_issue_defs[] = { .prep = io_eopnotsupp_prep, #endif }, + [IORING_OP_BUF_UPDATE] = { + .prep = io_buf_update_prep, + .issue = io_buf_update, + }, }; const struct io_cold_def io_cold_defs[] = { @@ -742,6 +746,9 @@ const struct io_cold_def io_cold_defs[] = { [IORING_OP_LISTEN] = { .name = "LISTEN", }, + [IORING_OP_BUF_UPDATE] = { + .name = "BUF_UPDATE", + }, }; const char *io_uring_get_opcode(u8 opcode) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 33a3d156a85b..6f0071733018 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1236,3 +1236,44 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg) fput(file); return ret; } + +struct io_buf_update { + struct file *file; + struct io_uring_rsrc_update2 up; +}; + +int io_buf_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_buf_update *ibu = io_kiocb_to_cmd(req, struct io_buf_update); + struct io_uring_rsrc_update2 __user *uaddr; + + if (!req->ctx->buf_data) + return -ENXIO; + if (sqe->ioprio || sqe->fd || sqe->addr2 || sqe->rw_flags || + sqe->splice_fd_in) + return -EINVAL; + if (sqe->len != 1) + return -EINVAL; + + uaddr = u64_to_user_ptr(READ_ONCE(sqe->addr)); + if (copy_from_user(&ibu->up, uaddr, sizeof(*uaddr))) + return -EFAULT; + + return 0; +} + +int io_buf_update(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_buf_update *ibu = io_kiocb_to_cmd(req, struct io_buf_update); + struct io_ring_ctx *ctx = req->ctx; + int ret; + + io_ring_submit_lock(ctx, issue_flags); + ret = __io_register_rsrc_update(ctx, IORING_RSRC_BUFFER, &ibu->up, ibu->up.nr); + io_ring_submit_unlock(ctx, issue_flags); + + if (ret < 0) + req_set_fail(req); + io_req_set_res(req, ret, 0); + return 0; +} diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 8ed588036210..d41e75c956ef 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -142,4 +142,7 @@ static inline void __io_unaccount_mem(struct user_struct *user, atomic_long_sub(nr_pages, &user->locked_vm); } +int io_buf_update_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_buf_update(struct io_kiocb *req, unsigned int issue_flags); + #endif -- Jens Axboe