On 11/1/24 01:04, Ming Lei wrote:
On Thu, Oct 31, 2024 at 01:16:07PM +0000, Pavel Begunkov wrote:
On 10/30/24 02:04, Ming Lei wrote:
On Wed, Oct 30, 2024 at 01:25:33AM +0000, Pavel Begunkov wrote:
On 10/30/24 00:45, Ming Lei wrote:
On Tue, Oct 29, 2024 at 04:47:59PM +0000, Pavel Begunkov wrote:
On 10/25/24 13:22, Ming Lei wrote:
...
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 4bc0d762627d..5a2025d48804 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -245,7 +245,8 @@ static int io_prep_rw_setup(struct io_kiocb *req, int ddir, bool do_import)
if (io_rw_alloc_async(req))
return -ENOMEM;
- if (!do_import || io_do_buffer_select(req))
+ if (!do_import || io_do_buffer_select(req) ||
+ io_use_leased_grp_kbuf(req))
return 0;
rw = req->async_data;
@@ -489,6 +490,11 @@ static bool __io_complete_rw_common(struct io_kiocb *req, long res)
}
req_set_fail(req);
req->cqe.res = res;
+ if (io_use_leased_grp_kbuf(req)) {
That's what I'm talking about, we're pushing more and
into the generic paths (or patching every single hot opcode
there is). You said it's fine for ublk the way it was, i.e.
without tracking, so let's then pretend it's a ublk specific
feature, kill that addition and settle at that if that's the
way to go.
As I mentioned before, it isn't ublk specific, zeroing is required
because the buffer is kernel buffer, that is all. Any other approach
needs this kind of handling too. The coming fuse zc need it.
And it can't be done in driver side, because driver has no idea how
to consume the kernel buffer.
Also it is only required in case of short read/recv, and it isn't
hot path, not mention it is just one check on request flag.
I agree, it's not hot, it's a failure path, and the recv side
is of medium hotness, but the main concern is that the feature
is too actively leaking into other requests.
The point is that if you'd like to support kernel buffer. If yes, this
kind of change can't be avoided.
There is no guarantee with the patchset that there will be any IO done
with that buffer, e.g. place a nop into the group, and even then you
Yes, here it depends on user. In case of ublk, the application has to be
trusted, and the situation is same with other user-emulated storage, such
as qemu.
have offsets and length, so it's not clear what the zeroying is supposed
to achieve.
The buffer may bee one page cache page, if it isn't initialized
completely, kernel data may be leaked to userspace via mmap.
Either the buffer comes fully "initialised", i.e. free of
kernel private data, or we need to track what parts of the buffer were
used.
That is why the only workable way is to zero the remainder in
consumer of OP, imo.
If it can leak kernel data in some way, I'm afraid zeroing of the
remainder alone won't be enough to prevent it, e.g. the recv/read
len doesn't have to match the buffer size.
So likely leased buffers should come to io_uring already
initialised, or more specifically it shouldn't contain any data
that the user space (ublk user space) is not supposed to see.
The other way is to track what parts of the buffer were actually
filled.
--
Pavel Begunkov