At 2023-11-13 21:26:21, "Theodore Ts'o" <tytso@xxxxxxx> wrote: >On Mon, Nov 13, 2023 at 10:15:05AM +0100, David Hildenbrand wrote: >> >> According to the man page: >> >> "The memory areas backing the file created with memfd_secret(2) are visible >> only to the processes that have access to the file descriptor. The memory >> region is removed from the kernel page tables and only the page tables of >> the processes holding the file descriptor map the corresponding physical >> memory. (Thus, the pages in the region can't be accessed by the kernel >> itself, so that, for example, pointers to the region can't be passed to >> system calls.) >> >> I'm not sure if the last part is actually true, if the syscalls end up >> walking user page tables to copy data in/out. > >The idea behind removing it from the kernel page tables is so that >kernel code running in some other process context won't be able to >reference the memory via the kernel address space. (So if there is >some kind of kernel zero-day which allows arbitrary code execution, >the injected attack code would have to play games with page tables >before being able to reference the memory --- this is not >*impossible*, just more annoying.) > >But if you are doing a buffered write, the copy from the user-supplied >buffer to the page cache is happening in the process's context. So >"foreground kernel code" can dereference the user-supplied pointer >just fine. > But the inconsistent treatment in kernel, memfd denied while mmaped-address allowed, is kind of confusing... I thought those two should be treated the same way.... Thanks David Wang