On 3/22/25 13:50, Ming Lei wrote:
On Sat, Mar 22, 2025 at 12:02:02PM +0000, Pavel Begunkov wrote:
On 3/22/25 07:56, Ming Lei wrote:
So far fixed kernel buffer is only used for FS read/write, in which
the remained bytes need to be zeroed in case of short read, otherwise
kernel data may be leaked to userspace.
Can you remind me, how that can happen? Normally, IIUC, you register
a request filled with user pages, so no kernel data there. Is it some
bounce buffers?
For direct io, it is filled with user pages, but it can be buffered IO,
and the page can be mapped to userspace.
I see. I don't mind the patch personally, but I think it's a security
concern, it's still a user space app even though privileged. Is there
a precedent maybe for fuse that we trust the user driver enough to
expose kernel memory?
One option is to try to distinguish when it contains user pages,
and conditionally zero it in ublk beforehand.
But if we consider that it's fine, can ublk zero during the struct
request completion? ublk should already know from the userspace driver
if it failed or whether it's a short IO.
Add two helpers for fixing this issue, meantime replace one check
with io_use_fixed_kbuf().
Cc: Caleb Sander Mateos <csander@xxxxxxxxxxxxxxx>
Cc: Keith Busch <kbusch@xxxxxxxxxx>
Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
---
...
+/* zero remained bytes of kernel buffer for avoiding to leak data */
+static inline void io_req_zero_remained(struct io_kiocb *req,
+ struct iov_iter *iter)
+{
+ size_t left = iov_iter_count(iter);
+
+ if (left > 0 && iov_iter_rw(iter) == READ)
+ iov_iter_zero(left, iter);
+}
+
#endif
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 039e063f7091..67dc1a6710c9 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -541,6 +541,12 @@ static void __io_complete_rw_common(struct io_kiocb *req, long res)
} else {
req_set_fail(req);
req->cqe.res = res;
+
+ if (io_use_fixed_kbuf(req)) {
+ struct io_async_rw *io = req->async_data;
+
+ io_req_zero_remained(req, &io->iter);
+ }
I think it can be exploited. It's called from ->ki_complete, i.e.
io_complete_rw, so make the request size enough, if you're stuck
copying in [soft]irq for too long.
Short read seldom happens, so how it can be exploited? And the request size
can't be too big in this(ublk) use case.
Denial of service by blocking irq. I'm pretty sure we can construct
a quite large bio / request in general case, e.g. with huge pages.
Maybe ublk forces splitting, but I wouldn't rely on the ublk
behaviour as it's a generic feature even though currently with
one user. We should move it to the task context, where io_uring
requests end up anyway. I'm pretty it can be cleaned up to not
have any overhead later.
--
Pavel Begunkov