On Mon, Nov 04, 2024 at 01:24:09PM +0000, Pavel Begunkov wrote: > On 11/4/24 13:08, Ming Lei wrote: > > On Mon, Nov 04, 2024 at 12:23:04PM +0000, Pavel Begunkov wrote: > > > On 11/4/24 01:21, Ming Lei wrote: > ...>>>> 3. The lease ends, and we copy full 4K back to user space with the > > > > > unitialised chunk. > > > > > > > > > > You can correct me on ublk specifics, I assume 3. is not a copy and > > > > > the user in 3 is the one using a ublk block device, but the point I'm > > > > > making is that if something similar is possible, then just zeroing is not > > > > > enough, the user can skip the step filling the buffer. If it can't leak > > > > > > > > Can you explain how user skips the step given read IO is member of one group? > > > > > > (2) Illustrates it, it can also be a nop with no read/recv > > > > As I explained before, the application has to be trusted, and it must > > have the permission to open the device & call into the buffer lease > > uring_cmd. > > It might be trusted to read some data of the process using the > device, but obviously it can't be trusted to read random kernel data. > I'm trying to understand which one is that. For example of ublk, one READ IO is coming on /dev/ublkbN, and the IO command is forwarded to userspace for handling: - the application(ublk server) read data from another file/socket into the kernel buffer of the IO command via io_uring io group for handling the READ IO - the leader uring_cmd leases kernel buffer to io_uring - member OPs read from FS or socket to the leased kernel buffer, and zeroing the remained part in case of short read/recv And how can one application read random kernel data? That is definitely one security problem. > > > It is in same situation with any user emulated storage, such as qemu, > > fuse, and the application has to do things right. > > > > > > > > > > any private data, then the buffer should've already been initialised by > > > > > the time it was lease. Initialised is in the sense that it contains no > > > > > > > > For block IO the practice is to zero the remainder after short read, please > > > > see example of loop, lo_complete_rq() & lo_read_simple(). > > > > > > It's more important for me to understand what it tries to fix, whether > > > we can leak kernel data without the patch, and whether it can be exploited > > > even with the change. We can then decide if it's nicer to zero or not. > > > > > > I can also ask it in a different way, can you tell is there some security > > > concern if there is no zeroing? And if so, can you describe what's the exact > > > way it can be triggered? > > > > Firstly the zeroing follows loop's handling for short read > > > Secondly, if the remainder part of one page cache buffer isn't zeroed, it might > > be leaked to userspace via another read() or mmap() on same page. > > What kind of data this leaked buffer can contain? Is it uninitialised > kernel memory like a freshly kmalloc'ed chunk would have? Or is it private > data of some user process? Yes, the page may be uninitialized, and might contain random kernel data. Thanks, Ming