Re: [PATCH V8 5/7] io_uring: support leased group buffer with REQ_F_GROUP_KBUF

Pavel Begunkov <asml.silence@xxxxxxxxx> · Mon, 4 Nov 2024 12:23:04 +0000

On 11/4/24 01:21, Ming Lei wrote:
On Mon, Nov 04, 2024 at 01:08:04AM +0000, Pavel Begunkov wrote:
On 11/4/24 00:16, Ming Lei wrote:
...
I agree, it's not hot, it's a failure path, and the recv side
is of medium hotness, but the main concern is that the feature
is too actively leaking into other requests.
The point is that if you'd like to support kernel buffer. If yes, this
kind of change can't be avoided.

There is no guarantee with the patchset that there will be any IO done
with that buffer, e.g. place a nop into the group, and even then you

Yes, here it depends on user. In case of ublk, the application has to be
trusted, and the situation is same with other user-emulated storage, such
as qemu.

have offsets and length, so it's not clear what the zeroying is supposed
to achieve.

The buffer may bee one page cache page, if it isn't initialized
completely, kernel data may be leaked to userspace via mmap.

Either the buffer comes fully "initialised", i.e. free of
kernel private data, or we need to track what parts of the buffer were
used.

That is why the only workable way is to zero the remainder in
consumer of OP, imo.

If it can leak kernel data in some way, I'm afraid zeroing of the
remainder alone won't be enough to prevent it, e.g. the recv/read
len doesn't have to match the buffer size.

The leased kernel buffer size is fixed, and the recv/read len is known
in case of short read/recv, the remainder part is known too, so can you
explain why zeroing remainder alone isn't enough?

"The buffer may bee one page cache page, if it isn't initialized
completely, kernel data may be leaked to userspace via mmap."

I don't know the exact path you meant in this sentence, but let's
take an example:

1. The leaser, e.g. ublk cmd, allocates an uninitialised page and
leases it to io_uring.

2. User space (e.g. ublk user space impl) does some IO to fill
the buffer, but it's buggy or malicious and fills only half of
the buffer:

recv(leased_buffer, offset=0, len = 2K);

So, one half is filled with data, the other half is still not
initialsed.

io_req_zero_remained() is added in this patch and called after the
half is done for both io_read() and net recv().

It zeroes what's left of the current request, but requests
don't have to cover the entire buffer.

3. The lease ends, and we copy full 4K back to user space with the
unitialised chunk.

You can correct me on ublk specifics, I assume 3. is not a copy and
the user in 3 is the one using a ublk block device, but the point I'm
making is that if something similar is possible, then just zeroing is not
enough, the user can skip the step filling the buffer. If it can't leak

Can you explain how user skips the step given read IO is member of one group?

(2) Illustrates it, it can also be a nop with no read/recv

any private data, then the buffer should've already been initialised by
the time it was lease. Initialised is in the sense that it contains no

For block IO the practice is to zero the remainder after short read, please
see example of loop, lo_complete_rq() & lo_read_simple().

It's more important for me to understand what it tries to fix, whether
we can leak kernel data without the patch, and whether it can be exploited
even with the change. We can then decide if it's nicer to zero or not.

I can also ask it in a different way, can you tell is there some security
concern if there is no zeroing? And if so, can you describe what's the exact
way it can be triggered?

--
Pavel Begunkov