On Fri, Dec 09, 2022 at 04:04:47PM +0000, jeffxu@xxxxxxxxxxxx wrote: > From: Jeff Xu <jeffxu@xxxxxxxxxx> > > Since Linux introduced the memfd feature, memfd have always had their > execute bit set, and the memfd_create() syscall doesn't allow setting > it differently. > > However, in a secure by default system, such as ChromeOS, (where all > executables should come from the rootfs, which is protected by Verified > boot), this executable nature of memfd opens a door for NoExec bypass > and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm > process created a memfd to share the content with an external process, > however the memfd is overwritten and used for executing arbitrary code > and root escalation. [2] lists more VRP in this kind. > > On the other hand, executable memfd has its legit use, runc uses memfd’s > seal and executable feature to copy the contents of the binary then > execute them, for such system, we need a solution to differentiate runc's > use of executable memfds and an attacker's [3]. > > To address those above, this set of patches add following: > 1> Let memfd_create() set X bit at creation time. > 2> Let memfd to be sealed for modifying X bit. > 3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of > X bit.For example, if a container has vm.memfd_noexec=2, then > memfd_create() without MFD_NOEXEC_SEAL will be rejected. > 4> A new security hook in memfd_create(). This make it possible to a new > LSM, which rejects or allows executable memfd based on its security policy. I think patch 1-5 look good to land. The LSM hook seems separable, and could continue on its own. Thoughts? (Which tree should memfd change go through?) -Kees -- Kees Cook