Because it's do quiesce, fixed read/write access buffers from asynchronous
contexts without synchronisation. That won't work anymore, so
1. either we save it in advance, that would require extra req_async
allocation for linked fixed rw
2. or synchronise whenever async. But that would mean that a request
may get and do IO on two different buffers, that's rotten.
3. do mixed -- lazy, but if do IO then alloc.
3.5 also "synchronise" there would mean uring_lock, that's not welcome,
but we can probably do rcu.
Are you referring to a case where a fixed buffer request can be submitted from async context while those buffers are being unregistered, or something like that?
Let me think of a patch...
The most convenient API would be [1], it selects a buffer during
submission, but allocates if needs to go async or for all linked
requests.
[2] should be correct from the kernel perspective (no races), it
also solves doing IO on 2 different buffers, that's nasty (BTW,
[1] solves this problem naturally). However, a buffer might be
selected async, but the following can happen, and user should
wait for request completion before removing a buffer.
1. register buf id=0
2. syscall io_uring_enter(submit=RW_FIXED,buf_id=0,IOSQE_ASYNC)
3. unregister buffers
4. the request may not find the buffer and fail.
Not very convenient + can actually add overhead on the userspace
side, can be even some heavy synchronisation.
uring_lock in [2] is not nice, but I think I can replace it
with rcu, probably can even help with sharing, but I need to
try to implement to be sure.
So that's an open question what API to have.
Neither of diffs is tested.
[1]
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7e35283fc0b1..2171836a9ce3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -826,6 +826,7 @@ static const struct io_op_def io_op_defs[] = {
.needs_file = 1,
.unbound_nonreg_file = 1,
.pollin = 1,
+ .needs_async_data = 1,
.plug = 1,
.async_size = sizeof(struct io_async_rw),
.work_flags = IO_WQ_WORK_BLKCG | IO_WQ_WORK_MM,
@@ -835,6 +836,7 @@ static const struct io_op_def io_op_defs[] = {
.hash_reg_file = 1,
.unbound_nonreg_file = 1,
.pollout = 1,
+ .needs_async_data = 1,
.plug = 1,
.async_size = sizeof(struct io_async_rw),
.work_flags = IO_WQ_WORK_BLKCG | IO_WQ_WORK_FSIZE |
[2]
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7e35283fc0b1..31560b879fb3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3148,7 +3148,12 @@ static ssize_t io_import_iovec(int rw, struct io_kiocb *req,
opcode = req->opcode;
if (opcode == IORING_OP_READ_FIXED || opcode == IORING_OP_WRITE_FIXED) {
*iovec = NULL;
- return io_import_fixed(req, rw, iter);
+
+ io_ring_submit_lock(req->ctx, needs_lock);
+ lockdep_assert_held(&req->ctx->uring_lock);
+ ret = io_import_fixed(req, rw, iter);
+ io_ring_submit_unlock(req->ctx, needs_lock);
+ return ret;
}
/* buffer index only valid with fixed read/write, or buffer select */
@@ -3638,7 +3643,7 @@ static int io_write(struct io_kiocb *req, bool force_nonblock,
copy_iov:
/* some cases will consume bytes even on error returns */
iov_iter_revert(iter, io_size - iov_iter_count(iter));
- ret = io_setup_async_rw(req, iovec, inline_vecs, iter, false);
+ ret = io_setup_async_rw(req, iovec, inline_vecs, iter, true);
if (!ret)
return -EAGAIN;
}