On Fri, Aug 05, 2022 at 10:21:21PM +0000, jeffxu@xxxxxxxxxx wrote: > This v2 series MFD_NOEXEC, this series includes: > 1> address comments in V1 > 2> add sysctl (vm.mfd_noexec) to change the default file permissions > of memfd_create to be non-executable. > > Below are cover-level for v1: > > The default file permissions on a memfd include execute bits, which > means that such a memfd can be filled with a executable and passed to > the exec() family of functions. This is undesirable on systems where all > code is verified and all filesystems are intended to be mounted noexec, > since an attacker may be able to use a memfd to load unverified code and > execute it. I would absolutely like to see some kind of protection here. However, I'd like a more specific threat model. What are the cases where the X bit has been abused (e.g.[1])? What are the cases where the X bit is needed (e.g.[2])? With those in mind, it should be possible to draw a clear line between the two cases. (e.g. we need to avoid a confused deputy attack where an "unprivileged" user can pass an executable memfd to a "privileged" user. How those privileges are defined may matter a lot based on how memfds are being used. For example, can runc's use of executable memfds be distinguished from an attacker's?) > Additionally, execution via memfd is a common way to avoid scrutiny for > malicious code, since it allows execution of a program without a file > ever appearing on disk. This attack vector is not totally mitigated with > this new flag, since the default memfd file permissions must remain > executable to avoid breaking existing legitimate uses, but it should be > possible to use other security mechanisms to prevent memfd_create calls > without MFD_NOEXEC on systems where it is known that executable memfds > are not necessary. This reminds me of dealing with non-executable stacks. There ended up being three states: - requested to be executable (PT_GNU_STACK X) - requested to be non-executable (PT_GNU_STACK NX) - undefined (no PT_GNU_STACK) The first two are clearly defined, but the third needed a lot of special handling. For a "safe by default" world, the third should be "NX", but old stuff depended on it being "X". Here, we have a bit being present or not, so we only have a binary state. I'd much rather the default be NX (no bit set) instead of making every future (safe) user of memfd have to specify MFD_NOEXEC. It's also easier on a filtering side to say "disallow memfd_create with MFD_EXEC", but how do we deal with the older software? If the default perms of memfd_create()'s exec bit is controlled by a sysctl and the sysctl is set to "leave it executable", how does a user create an NX memfd? (i.e. setting MFD_EXEC means "exec" and not setting it means "exec" also.) Are two bits needed? Seems wasteful. MFD_I_KNOW_HOW_TO_SET_EXEC | MFD_EXEC, etc... For F_SEAL_EXEC, it seems this should imply F_SEAL_WRITE if forced executable to avoid WX mappings (i.e. provide W^X from the start). -Kees [1] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20memfd%20escalation&can=1 [2] https://lwn.net/Articles/781013/ -- Kees Cook